Wednesday, June 19, 2019

Connecting my lab with to Azure with NSX

How to set up a site-to-site vpn between NSX in my lab and Azure: This setup uses the most basic option to connect NSX to Azure, using the basic SKU for the Virtual Network Gateway.
The trigger was that I wanted to know how to configure the VPN on the Azure site and to experiment with both on premises vm's and vm's in Azure. My network has a consumer grade router that did not have the right tools to setup a vpn to Azure (or I just did not get it to work).
On the NSX side, the blog from Cris Colotti helped a lot. It matches the capabilities of the basic Gateway in Azure perfectly. As the basic Gateway does not support BGP I setup static routes (Azure does this automagically) on my home router to the NSX Edge for the vNet in Azure.
A linux vm in Azure:

A traceroute from my desktop to the azure vm:
The NSX settings:


So, this is working well. But when using NSX and Azure, your networks are probably very dynamic. The next step will be to replace this site to site VPN setup with one not based on static routes but on BGP.

vCenter appliance 6.5 to 6.7U1 upgrade

Yesterday I performed an upgrade of a vCenter appliance to 6.7U1. All went well but there was one thing in the upgrade process that more or less surprised me. I got the following message:
The default partition '/' has only 4.1 GB of available space.
So the root partition had not enough space to have the backup of both the database and the historical data. You can give an alternative path with space to proceed.
When searching I found this article: "Export path provided does not have enough disk space" error upgrading to vCenter Server Appliance 6.0 (2113947) Which lets you make a new disk. I opted for looking on the vcsa to see if there was a alternative:

Filesystem                                Size  Used Avail Use% Mounted on
devtmpfs                                   16G     0   16G   0% /dev
tmpfs                                      16G   32K   16G   1% /dev/shm
tmpfs                                      16G  684K   16G   1% /run
tmpfs                                      16G     0   16G   0% /sys/fs/cgroup
/dev/sda3                                  11G  6.0G  4.1G  60% /
tmpfs                                      16G  1.4M   16G   1% /tmp
/dev/sda1                                 120M   35M   80M  30% /boot
/dev/mapper/log_vg-log                     25G  3.6G   20G  16% /storage/log
/dev/mapper/seat_vg-seat                   50G  8.5G   39G  19% /storage/seat
/dev/mapper/autodeploy_vg-autodeploy       25G   57M   24G   1% /storage/autodeploy
/dev/mapper/imagebuilder_vg-imagebuilder   25G   45M   24G   1% /storage/imagebuilder
/dev/mapper/dblog_vg-dblog                 25G  3.5G   20G  15% /storage/dblog
/dev/mapper/db_vg-db                       25G  907M   23G   4% /storage/db
/dev/mapper/core_vg-core                   50G   52M   47G   1% /storage/core
/dev/mapper/netdump_vg-netdump            9.8G   23M  9.2G   1% /storage/netdump

/dev/mapper/updatemgr_vg-updatemgr         99G  3.3G   91G   4% /storage/updatemgr

So, plenty of space on the update manager disk. And that is what I used:
/storage/updatemgr was filled in as path. This worked and while the export ran I monitorered the disk usage. When the export finished the status was this:

Filesystem                                Size  Used Avail Use% Mounted on
devtmpfs                                   16G     0   16G   0% /dev
tmpfs                                      16G   32K   16G   1% /dev/shm
tmpfs                                      16G  684K   16G   1% /run
tmpfs                                      16G     0   16G   0% /sys/fs/cgroup
/dev/sda3                                  11G  6.0G  4.1G  60% /
tmpfs                                      16G  139M   16G   1% /tmp
/dev/sda1                                 120M   35M   80M  30% /boot
/dev/mapper/log_vg-log                     25G  3.6G   20G  16% /storage/log
/dev/mapper/seat_vg-seat                   50G  8.5G   39G  19% /storage/seat
/dev/mapper/autodeploy_vg-autodeploy       25G   58M   24G   1% /storage/autodeploy
/dev/mapper/imagebuilder_vg-imagebuilder   25G   45M   24G   1% /storage/imagebuilder
/dev/mapper/dblog_vg-dblog                 25G  3.5G   20G  15% /storage/dblog
/dev/mapper/db_vg-db                       25G  907M   23G   4% /storage/db
/dev/mapper/core_vg-core                   50G   52M   47G   1% /storage/core
/dev/mapper/netdump_vg-netdump            9.8G   23M  9.2G   1% /storage/netdump

/dev/mapper/updatemgr_vg-updatemgr         99G  5.9G   88G   7% /storage/updatemgr

So, YMMV but this looks like a very simple alternative to adding a whole new disk.
If you have a very large vSphere environment, maybe a better solution is to add a disk before starting the upgrade procedure.

(this still works with vCenter 7.0 U2 upgrades)

Thursday, January 25, 2018

Running an Azure Stack POC on the tiniest hardware

Azure Stack Development Kit is the test/dev version of Azure Stack. It can be used to develop cloud applications that later will be deployed to Azure or Azure Stack. Another use is for building a small POC of Azure Stack.
Because I wanted to learn more about Azure Stack I setup the Azure Stack Development Kit in my lab. My home lab is not very big on resources. The minimal requirements are 760GB of disk space and 96GB of memory. My biggest host has 64GB of memory and is running ESXi.
So I tried to deploy it using a Windows VM with 55GB of memory on a datastore on one spinning disk. That did not go well. It ran for several hours but failed and was not able to finish even with several restarts.So to be successful use SSD storage. I now use a plain local Crucial MX300 which does the job. You need at least 400GB of free space on the datastore.
After some googling, I found several blogs that showed how to change the memory settings of the Hyper-V VM's that make up Azure Stack. Most of the settings were for older versions of Azure Stack.

Some of the blogs I looked at:
And RTFM!


All steps are already somewhere in a blog on the internet (especially in the blogs mentioned above). This information can be specific to the version I used (1.0.171020.1)

There are two major changes that you need to make.

  • Change the minimal hardware requirements check so that your VM is compatible.
  • Change the resources the Hyper-V VM's will use so that they will fit within you host VM.

Change the minimal hardware requirements 

Mount the VHDX file.
<drive>:\CloudDeployment\NuGetStore
and copy the file: Microsoft.AzureStack.Solution.Deploy.CloudDeployment.1.0.597.18.nupkg to another location and unzip it (7zip)
Go to 
<drive>\Microsoft.AzureStack.Solution.Deploy.CloudDeployment.1.0.597.18\content\Roles\PhysicalMachines\Tests
There is one file: BareMetal.Tests.ps1
Edit it:
In my version line 521:
$physicalMachine.IsVirtualMachine | Should Be $true
Change $false in $true

<drive>\Microsoft.AzureStack.Solution.Deploy.CloudDeployment.1.0.597.18\content\Configuration\Roles\Infrastructure\BareMetal
Edit OneNodeRole.xml
Line 113:
<TimeServiceStartTimeoutInMinutes>6</TimeServiceStartTimeoutInMinutes>
Line 117:
<StartupTimeoutInMinutes>30</StartupTimeoutInMinutes>
Line 118:
<ShutdownTimeoutInMinutes>30</ShutdownTimeoutInMinutes>
These lines were changed when trying to get to run on one spinning sata disk, you probably do not need these changes.

Line 131:
<MinimumNumberOfCoresPerMachine>1</MinimumNumberOfCoresPerMachine>
Line 132:
<MinimumPhysicalMemoryPerMachineGB>24</MinimumPhysicalMemoryPerMachineGB>

Change the resources the Hyper-V VM's will use

<drive>\Microsoft.AzureStack.Solution.Deploy.CloudDeployment.1.0.597.18\content\Configuration\Roles\Infrastructure\Domain
Edit OneNodeRole.xml
Line 105:
<Node Role="Domain" Name="[PREFIX]-DC[{NN}]"  ProvisioningStatus="" Type="VirtualMachine" Id="[{CPI_GUID(-DC[{NN}])}]" StartUpMemoryBytes="2147483648" ProcessorCount="1" DynamicMemory="True" HighlyAvailable="False" RefNodeId="" StoreVhdLocally="True" DeleteUnattend="False" SkipDomainJoin="True" SupportPreDomainCompletion="True" SkipCertUsingDSC="True">
The StartUp memory was halved, so was the ProcessorCount and DynamicMemory was enabled.

<drive>\scripts\Microsoft.AzureStack.Solution.Deploy.CloudDeployment.1.0.597.18\content\Configuration\Roles\Fabric\ACS
Edit OneNodeRole.xml
Line 23:
<Node Role="ACS" Name="[PREFIX]-ACS[{NN}]" ProvisioningStatus="" Type="VirtualMachine" Id="[{CPI_GUID(-ACS[{NN}])}]" StartUpMemoryBytes="8589934592" ProcessorCount="1" DynamicMemory="True" HighlyAvailable="True" RefNodeId="" DeleteUnattend="False">
DynamicMemory was enabled.

<drive>Microsoft.AzureStack.Solution.Deploy.CloudDeployment.1.0.597.18\content\Configuration\Roles\Fabric\ADFS
Edit OneNodeRole.xml
Line 6:
<Node Role="ADFS" Name="[PREFIX]-ADFS[{NN}]" ProvisioningStatus="" Type="VirtualMachine" Id="[{CPI_GUID(-ADFS[{NN}])}]" StartUpMemoryBytes="2147483648" ProcessorCount="1" DynamicMemory="True" HighlyAvailable="True" RefNodeId="" DeleteUnattend="False">
ProcessorCount on 1 and DynamicMemory enabled.

<drive>\Microsoft.AzureStack.Solution.Deploy.CloudDeployment.1.0.597.18\content\Configuration\Roles\Fabric\BGP
Edit OneNodeRole.xml
<Node Role="BGP" Name="[PREFIX]-BGPNAT[{NN}]" ProvisioningStatus="" Type="VirtualMachine" Id="[{CPI_GUID(-BGPNAT[{NN}])}]" StartUpMemoryBytes="2147483648" ProcessorCount="1" DynamicMemory="True" HighlyAvailable="True" RefNodeId="" DeleteUnattend="False" SkipCertUsingDSC="True">
ProcessorCount on 1 and DynamicMemory enabled.

<drive>\Microsoft.AzureStack.Solution.Deploy.CloudDeployment.1.0.597.18\content\Configuration\Roles\Fabric\CA
Edit OneNodeRole.xml
<Node Role="CA" Name="[PREFIX]-CA[{NN}]"  ProvisioningStatus="" Type="VirtualMachine" Id="[{CPI_GUID(-CA[{NN}])}]" StartUpMemoryBytes="1073741824" ProcessorCount="1" DynamicMemory="True" HighlyAvailable="True" RefNodeId="" DeleteUnattend="False" SkipCertUsingDSC="True">
ProcessorCount on 1 and DynamicMemory enabled.

<drive>\Microsoft.AzureStack.Solution.Deploy.CloudDeployment.1.0.597.18\content\Configuration\Roles\Fabric\FabricRingServices
Edit OneNodeRole.xml
Line 137:
<Node Role="FabricRingServices" Name="[PREFIX]-Xrp[{NN}]"  ProvisioningStatus="" Type="VirtualMachine" Id="[{CPI_GUID(-Xrp[{NN}])}]" StartUpMemoryBytes="4294967296" ProcessorCount="2" DynamicMemory="True" HighlyAvailable="True" RefNodeId="" DeleteUnattend="False">
The StartUp memory was halved, so was the ProcessorCount and DynamicMemory was enabled.

<drive>\Microsoft.AzureStack.Solution.Deploy.CloudDeployment.1.0.597.18\content\Configuration\Roles\Fabric\Gateway
Edit OneNodeRole.xml
<Node Role="Gateway" Name="[PREFIX]-Gwy[{NN}]"  ProvisioningStatus="" Type="VirtualMachine" Id="[{CPI_GUID(-Gwy[{NN}])}]" StartUpMemoryBytes="2147483648" ProcessorCount="1" DynamicMemory="True" HighlyAvailable="True" RefNodeId="" DeleteUnattend="False">
ProcessorCount on 1 and DynamicMemory enabled.

<drive>\Microsoft.AzureStack.Solution.Deploy.CloudDeployment.1.0.597.18\content\Configuration\Roles\Fabric\NC
Edit OneNodeRole.xml
<Node Role="NC" Name="[PREFIX]-NC[{NN}]"  ProvisioningStatus="" Type="VirtualMachine" Id="[{CPI_GUID(-NC[{NN}])}]" StartUpMemoryBytes="2147483648" ProcessorCount="1" DynamicMemory="True" HighlyAvailable="True" RefNodeId="" DeleteUnattend="False">
The StartUp memory was halved, so was the ProcessorCount and DynamicMemory was enabled.

<drive>\Microsoft.AzureStack.Solution.Deploy.CloudDeployment.1.0.597.18\content\Configuration\Roles\Fabric\SeedRingServices
Edit OneNodeRole.xml
<Node Role="SeedRingServices" Name="[PREFIX]-ERCS[{NN}]"  ProvisioningStatus="" Type="VirtualMachine" Id="[{CPI_GUID(-ERCS[{NN}])}]" StartUpMemoryBytes="2147483648" ProcessorCount="2" DynamicMemory="True" HighlyAvailable="False" RefNodeId="" StoreVhdLocally="True" DeleteUnattend="False" ConfigureProxy="[UseWebProxy]">
ProcessorCount on 2 and DynamicMemory enabled.

<drive>\Microsoft.AzureStack.Solution.Deploy.CloudDeployment.1.0.597.18\content\Configuration\Roles\Fabric\SLB
Edit OneNodeRole.xml
<Node Role="SLB" Name="[PREFIX]-SLB[{NN}]"  ProvisioningStatus="" Type="VirtualMachine" Id="[{CPI_GUID(-SLB[{NN}])}]" StartUpMemoryBytes="2147483648" ProcessorCount="2" DynamicMemory="True" HighlyAvailable="True" RefNodeId="" DeleteUnattend="False">
ProcessorCount on 2 and DynamicMemory enabled.

<drive>\Microsoft.AzureStack.Solution.Deploy.CloudDeployment.1.0.597.18\content\Configuration\Roles\Fabric\SQL
Edit OneNodeRole.xml
<Node Role="SQL" Name="[PREFIX]-Sql[{NN}]"  ProvisioningStatus="" Type="VirtualMachine" Id="[{CPI_GUID(-Sql[{NN}])}]" StartUpMemoryBytes="2147483648" ProcessorCount="2" DynamicMemory="True" HighlyAvailable="True" RefNodeId="" DeleteUnattend="False">
The StartUp memory was halved and DynamicMemory was enabled.

<drive>\Microsoft.AzureStack.Solution.Deploy.CloudDeployment.1.0.597.18\content\Configuration\Roles\Fabric\WAS
Edit OneNodeRole.xml
<Node Role="WAS" Name="[PREFIX]-WAS[{NN}]"  ProvisioningStatus="" Type="VirtualMachine" Id="[{CPI_GUID(-WAS[{NN}])}]" StartUpMemoryBytes="2147483648" ProcessorCount="1" DynamicMemory="True" HighlyAvailable="True" RefNodeId="" DeleteUnattend="False">
The StartUp memory was halved, so was the ProcessorCount and DynamicMemory was enabled.

<drive>\Microsoft.AzureStack.Solution.Deploy.CloudDeployment.1.0.597.18\content\Configuration\Roles\Fabric\WASPublic
Edit OneNodeRole.xml
<Node Role="WASPUBLIC" Name="[PREFIX]-WASP[{NN}]"  ProvisioningStatus="" Type="VirtualMachine" Id="[{CPI_GUID(-WASP[{NN}])}]" StartUpMemoryBytes="4294967296" ProcessorCount="2" DynamicMemory="True" HighlyAvailable="True" RefNodeId="" DeleteUnattend="False">
The StartUp memory was halved and DynamicMemory was enabled.

With these changes I got it working.
What is next:

  • Delete everything and deploy the latest version
  • Deploy the SQL Server resource provider and the App Service resource provider.






Wednesday, March 29, 2017

Space reclamation with vSphere 6.5 and NetApp storage

In vSphere 6.5 VMware re-enabled automatic space reclamation. Automatic space reclamation was once introduced in vSphere 5.0 but gave a lot of problems with some arrays. In the next update VMware disabled the automatic part. Since then I have not seen many users running space reclamation by hand. But now its back, although you have to enable it, it is not enabled by default.
In this post I will show how to enable Automatic Space Reclamation with the setup in my lab and do some tests.

There are of course already some very good blogs about this topic:
Automatic space reclamation (UNMAP) is back in vSphere 6.5
WHAT’S NEW IN ESXI 6.5 STORAGE PART I: UNMAP

And the vSphere documentation:
Storage Space Reclamation

First some bad news, you will need the latest version of VMFS (Version 6) and there in no in place upgrade from VMFS 5. So you need to create a new datastore.

on the datastore there is a new setting:

SSD2 a a datastore on al local SSD.

From the ESXi host you can check the settings.
Deploy an Window 2016 vm on the datastore.
It got 2 disks, the boot drive and a test drive. The test drive is formatted with Microsofts new filesystem: ReFS.
The boot disk has 10GB of data on it, the second disk is empty.
At this moment the datastore has 14GB space used.
Lets copy some data to the test drive. Some ISO files.
So, let's look at the datastore.


The VMDK has grown with data we put in it.
For Windows to send the correct command the disk should be recognized as a thin disk.
And it is.

Next. delete the data.

Has the VMDK shrunk?
No is has not.
Now, test with the same VMDK reformatted with NTFS.

Repeat the same trick: Copy data, then delete.
Before the copy:

After the copy:

After the delete:
And there it is: The VMDK has shrunk.
So on my local SSD this works with NTFS.
Now add a NetApp FAS iSCSI lun and use it as a datastore. The same test VM will get a new disk on the datastore. As this is no manual on how to create a lun on a NetApp controller we will skip that part but there is one setting very crucial. The -space-allocation enabled setting on the lun. Without it the ESXi host will not be able to detect that the NetApp controller is able to handle the unmap commands.
The lun created:
Use this command to change the setting:
lun modify -path /vol/iSCSI_DS_vol/iSCSI_DS -space-allocation enabled

Empty lun to start with:
After copy:
After delete, almost empty again:
As I am testing on a simulator, the lun does not shrink immediately. How fast this works on a real production system I don't know yet. 

From an storage efficiency standpoint this is a great feature, but you have to be willing to go thin on thin. Not many administrators would want to run manual unmap but when it happens automatic it will be used.
I also tested with a NFS datastore but Windows does not recognize the VMDK as a thin disk and the VMDK is not shrunk. (tested without the NetApp NFS VAAI plugin)