Thursday, January 25, 2018

Running an Azure Stack POC on the tiniest hardware

Azure Stack Development Kit is the test/dev version of Azure Stack. It can be used to develop cloud applications that later will be deployed to Azure or Azure Stack. Another use is for building a small POC of Azure Stack.
Because I wanted to learn more about Azure Stack I setup the Azure Stack Development Kit in my lab. My home lab is not very big on resources. The minimal requirements are 760GB of disk space and 96GB of memory. My biggest host has 64GB of memory and is running ESXi.
So I tried to deploy it using a Windows VM with 55GB of memory on a datastore on one spinning disk. That did not go well. It ran for several hours but failed and was not able to finish even with several restarts.So to be successful use SSD storage. I now use a plain local Crucial MX300 which does the job. You need at least 400GB of free space on the datastore.
After some googling, I found several blogs that showed how to change the memory settings of the Hyper-V VM's that make up Azure Stack. Most of the settings were for older versions of Azure Stack.

Some of the blogs I looked at:
And RTFM!


All steps are already somewhere in a blog on the internet (especially in the blogs mentioned above). This information can be specific to the version I used (1.0.171020.1)

There are two major changes that you need to make.

  • Change the minimal hardware requirements check so that your VM is compatible.
  • Change the resources the Hyper-V VM's will use so that they will fit within you host VM.

Change the minimal hardware requirements 

Mount the VHDX file.
<drive>:\CloudDeployment\NuGetStore
and copy the file: Microsoft.AzureStack.Solution.Deploy.CloudDeployment.1.0.597.18.nupkg to another location and unzip it (7zip)
Go to 
<drive>\Microsoft.AzureStack.Solution.Deploy.CloudDeployment.1.0.597.18\content\Roles\PhysicalMachines\Tests
There is one file: BareMetal.Tests.ps1
Edit it:
In my version line 521:
$physicalMachine.IsVirtualMachine | Should Be $true
Change $false in $true

<drive>\Microsoft.AzureStack.Solution.Deploy.CloudDeployment.1.0.597.18\content\Configuration\Roles\Infrastructure\BareMetal
Edit OneNodeRole.xml
Line 113:
<TimeServiceStartTimeoutInMinutes>6</TimeServiceStartTimeoutInMinutes>
Line 117:
<StartupTimeoutInMinutes>30</StartupTimeoutInMinutes>
Line 118:
<ShutdownTimeoutInMinutes>30</ShutdownTimeoutInMinutes>
These lines were changed when trying to get to run on one spinning sata disk, you probably do not need these changes.

Line 131:
<MinimumNumberOfCoresPerMachine>1</MinimumNumberOfCoresPerMachine>
Line 132:
<MinimumPhysicalMemoryPerMachineGB>24</MinimumPhysicalMemoryPerMachineGB>

Change the resources the Hyper-V VM's will use

<drive>\Microsoft.AzureStack.Solution.Deploy.CloudDeployment.1.0.597.18\content\Configuration\Roles\Infrastructure\Domain
Edit OneNodeRole.xml
Line 105:
<Node Role="Domain" Name="[PREFIX]-DC[{NN}]"  ProvisioningStatus="" Type="VirtualMachine" Id="[{CPI_GUID(-DC[{NN}])}]" StartUpMemoryBytes="2147483648" ProcessorCount="1" DynamicMemory="True" HighlyAvailable="False" RefNodeId="" StoreVhdLocally="True" DeleteUnattend="False" SkipDomainJoin="True" SupportPreDomainCompletion="True" SkipCertUsingDSC="True">
The StartUp memory was halved, so was the ProcessorCount and DynamicMemory was enabled.

<drive>\scripts\Microsoft.AzureStack.Solution.Deploy.CloudDeployment.1.0.597.18\content\Configuration\Roles\Fabric\ACS
Edit OneNodeRole.xml
Line 23:
<Node Role="ACS" Name="[PREFIX]-ACS[{NN}]" ProvisioningStatus="" Type="VirtualMachine" Id="[{CPI_GUID(-ACS[{NN}])}]" StartUpMemoryBytes="8589934592" ProcessorCount="1" DynamicMemory="True" HighlyAvailable="True" RefNodeId="" DeleteUnattend="False">
DynamicMemory was enabled.

<drive>Microsoft.AzureStack.Solution.Deploy.CloudDeployment.1.0.597.18\content\Configuration\Roles\Fabric\ADFS
Edit OneNodeRole.xml
Line 6:
<Node Role="ADFS" Name="[PREFIX]-ADFS[{NN}]" ProvisioningStatus="" Type="VirtualMachine" Id="[{CPI_GUID(-ADFS[{NN}])}]" StartUpMemoryBytes="2147483648" ProcessorCount="1" DynamicMemory="True" HighlyAvailable="True" RefNodeId="" DeleteUnattend="False">
ProcessorCount on 1 and DynamicMemory enabled.

<drive>\Microsoft.AzureStack.Solution.Deploy.CloudDeployment.1.0.597.18\content\Configuration\Roles\Fabric\BGP
Edit OneNodeRole.xml
<Node Role="BGP" Name="[PREFIX]-BGPNAT[{NN}]" ProvisioningStatus="" Type="VirtualMachine" Id="[{CPI_GUID(-BGPNAT[{NN}])}]" StartUpMemoryBytes="2147483648" ProcessorCount="1" DynamicMemory="True" HighlyAvailable="True" RefNodeId="" DeleteUnattend="False" SkipCertUsingDSC="True">
ProcessorCount on 1 and DynamicMemory enabled.

<drive>\Microsoft.AzureStack.Solution.Deploy.CloudDeployment.1.0.597.18\content\Configuration\Roles\Fabric\CA
Edit OneNodeRole.xml
<Node Role="CA" Name="[PREFIX]-CA[{NN}]"  ProvisioningStatus="" Type="VirtualMachine" Id="[{CPI_GUID(-CA[{NN}])}]" StartUpMemoryBytes="1073741824" ProcessorCount="1" DynamicMemory="True" HighlyAvailable="True" RefNodeId="" DeleteUnattend="False" SkipCertUsingDSC="True">
ProcessorCount on 1 and DynamicMemory enabled.

<drive>\Microsoft.AzureStack.Solution.Deploy.CloudDeployment.1.0.597.18\content\Configuration\Roles\Fabric\FabricRingServices
Edit OneNodeRole.xml
Line 137:
<Node Role="FabricRingServices" Name="[PREFIX]-Xrp[{NN}]"  ProvisioningStatus="" Type="VirtualMachine" Id="[{CPI_GUID(-Xrp[{NN}])}]" StartUpMemoryBytes="4294967296" ProcessorCount="2" DynamicMemory="True" HighlyAvailable="True" RefNodeId="" DeleteUnattend="False">
The StartUp memory was halved, so was the ProcessorCount and DynamicMemory was enabled.

<drive>\Microsoft.AzureStack.Solution.Deploy.CloudDeployment.1.0.597.18\content\Configuration\Roles\Fabric\Gateway
Edit OneNodeRole.xml
<Node Role="Gateway" Name="[PREFIX]-Gwy[{NN}]"  ProvisioningStatus="" Type="VirtualMachine" Id="[{CPI_GUID(-Gwy[{NN}])}]" StartUpMemoryBytes="2147483648" ProcessorCount="1" DynamicMemory="True" HighlyAvailable="True" RefNodeId="" DeleteUnattend="False">
ProcessorCount on 1 and DynamicMemory enabled.

<drive>\Microsoft.AzureStack.Solution.Deploy.CloudDeployment.1.0.597.18\content\Configuration\Roles\Fabric\NC
Edit OneNodeRole.xml
<Node Role="NC" Name="[PREFIX]-NC[{NN}]"  ProvisioningStatus="" Type="VirtualMachine" Id="[{CPI_GUID(-NC[{NN}])}]" StartUpMemoryBytes="2147483648" ProcessorCount="1" DynamicMemory="True" HighlyAvailable="True" RefNodeId="" DeleteUnattend="False">
The StartUp memory was halved, so was the ProcessorCount and DynamicMemory was enabled.

<drive>\Microsoft.AzureStack.Solution.Deploy.CloudDeployment.1.0.597.18\content\Configuration\Roles\Fabric\SeedRingServices
Edit OneNodeRole.xml
<Node Role="SeedRingServices" Name="[PREFIX]-ERCS[{NN}]"  ProvisioningStatus="" Type="VirtualMachine" Id="[{CPI_GUID(-ERCS[{NN}])}]" StartUpMemoryBytes="2147483648" ProcessorCount="2" DynamicMemory="True" HighlyAvailable="False" RefNodeId="" StoreVhdLocally="True" DeleteUnattend="False" ConfigureProxy="[UseWebProxy]">
ProcessorCount on 2 and DynamicMemory enabled.

<drive>\Microsoft.AzureStack.Solution.Deploy.CloudDeployment.1.0.597.18\content\Configuration\Roles\Fabric\SLB
Edit OneNodeRole.xml
<Node Role="SLB" Name="[PREFIX]-SLB[{NN}]"  ProvisioningStatus="" Type="VirtualMachine" Id="[{CPI_GUID(-SLB[{NN}])}]" StartUpMemoryBytes="2147483648" ProcessorCount="2" DynamicMemory="True" HighlyAvailable="True" RefNodeId="" DeleteUnattend="False">
ProcessorCount on 2 and DynamicMemory enabled.

<drive>\Microsoft.AzureStack.Solution.Deploy.CloudDeployment.1.0.597.18\content\Configuration\Roles\Fabric\SQL
Edit OneNodeRole.xml
<Node Role="SQL" Name="[PREFIX]-Sql[{NN}]"  ProvisioningStatus="" Type="VirtualMachine" Id="[{CPI_GUID(-Sql[{NN}])}]" StartUpMemoryBytes="2147483648" ProcessorCount="2" DynamicMemory="True" HighlyAvailable="True" RefNodeId="" DeleteUnattend="False">
The StartUp memory was halved and DynamicMemory was enabled.

<drive>\Microsoft.AzureStack.Solution.Deploy.CloudDeployment.1.0.597.18\content\Configuration\Roles\Fabric\WAS
Edit OneNodeRole.xml
<Node Role="WAS" Name="[PREFIX]-WAS[{NN}]"  ProvisioningStatus="" Type="VirtualMachine" Id="[{CPI_GUID(-WAS[{NN}])}]" StartUpMemoryBytes="2147483648" ProcessorCount="1" DynamicMemory="True" HighlyAvailable="True" RefNodeId="" DeleteUnattend="False">
The StartUp memory was halved, so was the ProcessorCount and DynamicMemory was enabled.

<drive>\Microsoft.AzureStack.Solution.Deploy.CloudDeployment.1.0.597.18\content\Configuration\Roles\Fabric\WASPublic
Edit OneNodeRole.xml
<Node Role="WASPUBLIC" Name="[PREFIX]-WASP[{NN}]"  ProvisioningStatus="" Type="VirtualMachine" Id="[{CPI_GUID(-WASP[{NN}])}]" StartUpMemoryBytes="4294967296" ProcessorCount="2" DynamicMemory="True" HighlyAvailable="True" RefNodeId="" DeleteUnattend="False">
The StartUp memory was halved and DynamicMemory was enabled.

With these changes I got it working.
What is next:

  • Delete everything and deploy the latest version
  • Deploy the SQL Server resource provider and the App Service resource provider.






Wednesday, March 29, 2017

Space reclamation with vSphere 6.5 and NetApp storage

In vSphere 6.5 VMware re-enabled automatic space reclamation. Automatic space reclamation was once introduced in vSphere 5.0 but gave a lot of problems with some arrays. In the next update VMware disabled the automatic part. Since then I have not seen many users running space reclamation by hand. But now its back, although you have to enable it, it is not enabled by default.
In this post I will show how to enable Automatic Space Reclamation with the setup in my lab and do some tests.

There are of course already some very good blogs about this topic:
Automatic space reclamation (UNMAP) is back in vSphere 6.5
WHAT’S NEW IN ESXI 6.5 STORAGE PART I: UNMAP

And the vSphere documentation:
Storage Space Reclamation

First some bad news, you will need the latest version of VMFS (Version 6) and there in no in place upgrade from VMFS 5. So you need to create a new datastore.

on the datastore there is a new setting:

SSD2 a a datastore on al local SSD.

From the ESXi host you can check the settings.
Deploy an Window 2016 vm on the datastore.
It got 2 disks, the boot drive and a test drive. The test drive is formatted with Microsofts new filesystem: ReFS.
The boot disk has 10GB of data on it, the second disk is empty.
At this moment the datastore has 14GB space used.
Lets copy some data to the test drive. Some ISO files.
So, let's look at the datastore.


The VMDK has grown with data we put in it.
For Windows to send the correct command the disk should be recognized as a thin disk.
And it is.

Next. delete the data.

Has the VMDK shrunk?
No is has not.
Now, test with the same VMDK reformatted with NTFS.

Repeat the same trick: Copy data, then delete.
Before the copy:

After the copy:

After the delete:
And there it is: The VMDK has shrunk.
So on my local SSD this works with NTFS.
Now add a NetApp FAS iSCSI lun and use it as a datastore. The same test VM will get a new disk on the datastore. As this is no manual on how to create a lun on a NetApp controller we will skip that part but there is one setting very crucial. The -space-allocation enabled setting on the lun. Without it the ESXi host will not be able to detect that the NetApp controller is able to handle the unmap commands.
The lun created:
Use this command to change the setting:
lun modify -path /vol/iSCSI_DS_vol/iSCSI_DS -space-allocation enabled

Empty lun to start with:
After copy:
After delete, almost empty again:
As I am testing on a simulator, the lun does not shrink immediately. How fast this works on a real production system I don't know yet. 

From an storage efficiency standpoint this is a great feature, but you have to be willing to go thin on thin. Not many administrators would want to run manual unmap but when it happens automatic it will be used.
I also tested with a NFS datastore but Windows does not recognize the VMDK as a thin disk and the VMDK is not shrunk. (tested without the NetApp NFS VAAI plugin)