Solving vShield Edge Gateways Not Upgrading/Re-deploying after vSM 5.0.1 to 5.1.2 Upgrade

After upgrading from vCloud Director 1.5.1 to 5.1.2, vShield Manager 5.0.1 to 5.1.2 and vSphere 5.0 to 5.1.0 following all of the Best Practices KBs for each, the time came to upgrade off the vShield Edge Gateways to take advantage of some of the advanced capabilities and performance. When I attempted this via vCloud Director (right-click Edge Gateway and choose ‘Re-deploy’), I was met with this error message:

Cannot redeploy edge gateway BizDev External Network (urn:uuid:f1e69daa-7b56-4e8b-8713-549cfbe8c9f7) org.springframework.web.client.RestClientException: Redeploy failed: Edge connected to ‘dvportgroup-9622’ failed to upgrade.

Inspecting the vCloud Director debug logs revealed this:

2013-05-29 07:42:56,316 | DEBUG | nf-activity-pool-192 | LoggingRestTemplate | Created POST request for "https://10.10.10.56:443/api/2.0/networks/dvportgroup-9622/edge/upgrade" |

2013-05-29 07:42:56,316 | DEBUG | nf-activity-pool-192 | LoggingRestTemplate | Request::URI:https://10.10.10.56/api/2.0/networks/dvportgroup-9622/edge/upgrade method:POST |
2013-05-29 07:42:56,316 | DEBUG | nf-activity-pool-192 | LoggingRestTemplate | Request body :<none> |
2013-05-29 07:42:56,406 | WARN | nf-activity-pool-192 | LoggingRestTemplate | POST request for "https://10.10.10.56:443/api/2.0/networks/dvportgroup-9622/edge/upgrade" resulted in 404 (Not Found); invoking error handler |
2013-05-29 07:42:56,406 | ERROR | nf-activity-pool-192 | NetworkSecurityErrorHandler | Response error xml : <?xml version="1.0" encoding="UTF-8" standalone="yes"?><Errors><Error><code>70001</code><description>vShield Edge not installed for given networkID. Cannot proceed with the operation</description></Error></Errors> |
2013-05-29 07:42:56,407 | DEBUG | nf-activity-pool-192 | EdgeManagerSpock | Failed upgrading edge connected to dvportgroup-9622. |
com.vmware.vcloud.fabric.nsm.error.VsmException: vShield Edge not installed for given networkID. Cannot proceed with the operation

at com.vmware.vcloud.fabric.nsm.error.NetworkSecurityErrorHandler.processException(NetworkSecurityErrorHandler.java:95)
 at com.vmware.vcloud.fabric.nsm.error.NetworkSecurityErrorHandler.handleError(NetworkSecurityErrorHandler.java:70)
 at org.springframework.web.client.RestTemplate.handleResponseError(RestTemplate.java:486)
 at org.springframework.web.client.RestTemplate.doExecute(RestTemplate.java:443)
 at com.vmware.vcloud.fabric.net.utils.impl.LoggingRestTemplate.doExecute(LoggingRestTemplate.java:64)
 at org.springframework.web.client.RestTemplate.execute(RestTemplate.java:401)
 at org.springframework.web.client.RestTemplate.postForEntity(RestTemplate.java:302)
 at com.vmware.vcloud.fabric.net.utils.impl.RestClient.postForLocation(RestClient.java:108)
 at com.vmware.vcloud.fabric.nsm.services.spock.EdgeManagerSpock.redeployEdge(EdgeManagerSpock.java:728)
 at com.vmware.vcloud.fabric.net.activities.gateway.DeployGatewayActivity$GenerateBacking.invoke(DeployGatewayActivity.java:347)
 at com.vmware.vcloud.fabric.foundation.activity.executors.ActivityRunner.run(ActivityRunner.java:123)
 at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
 at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
 at java.util.concurrent.FutureTask.run(Unknown Source)
 at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
 at java.lang.Thread.run(Unknown Source)
2013-05-29 07:42:56,407 | ERROR | nf-activity-pool-192 | DeployGatewayActivity | [Activity Execution] Handle: urn:uuid:f1e69daa-7b56-4e8b-8713-549cfbe8c9f7, Current Phase: com.vmware.vcloud.fabric.net.activities.gateway.DeployGatewayActivity$GenerateBacking, ActivityExecutionState Parameter Names: [BACKING_SPEC, NDC, activitySupervisionRequest, com.vmware.activityEntityRecord.EntityId, REDEPLOY, DEPLOY_PARAMS] - Could not deploy gateway BizDev External Network |
org.springframework.web.client.RestClientException: Redeploy failed: Edge connected to 'dvportgroup-9622' failed to upgrade.
 at

-- snip --
2013-05-29 07:42:56,437 | DEBUG | LocalTaskScheduler-Pool-31 | JobString | Job object - Object : BizDev External Network(com.vmware.vcloud.entity.gateway:d21b172b-b926-46e7-8e8b-07fb71843b18) operation name: NETWORK_GATEWAY_REDEPLOY | vcd=83908311-0f60-48e3-a2ec-f10f07c4f187,task=b6261962-0d14-48b0-836b-45fc0d68df65
2013-05-29 07:42:56,486 | DEBUG | LocalTaskScheduler-Pool-31 | CJob | No last pending job : [BizDev External Network(com.vmware.vcloud.entity.gateway:d21b172b-b926-46e7-8e8b-07fb71843b18)], status=[3] | vcd=83908311-0f60-48e3-a2ec-f10f07c4f187,task=b6261962-0d14-48b0-836b-45fc0d68df65
2013-05-29 07:42:56,487 | DEBUG | LocalTaskScheduler-Pool-31 | CJob | Update last job : [BizDev External Network(com.vmware.vcloud.entity.gateway:d21b172b-b926-46e7-8e8b-07fb71843b18)], status=[3], [5/29/13 7:42 AM] | vcd=83908311-0f60-48e3-a2ec-f10f07c4f187,task=b6261962-0d14-48b0-836b-45fc0d68df65
2013-05-29 07:42:56,487 | DEBUG | LocalTaskScheduler-Pool-31 | TaskServiceImpl | Cleaning busy entities for task 'b6261962-0d14-48b0-836b-45fc0d68df65' | vcd=83908311-0f60-48e3-a2ec-f10f07c4f187,task=b6261962-0d14-48b0-836b-45fc0d68df65
2013-05-29 07:42:56,488 | DEBUG | LocalTaskScheduler-Pool-31 | BusyObjectServiceImpl | Unsetting 1 busy entitie(s) for task ref NETWORK_GATEWAY_REDEPLOY(com.vmware.vcloud.entity.task:b6261962-0d14-48b0-836b-45fc0d68df65) | vcd=83908311-0f60-48e3-a2ec-f10f07c4f187,task=b6261962-0d14-48b0-836b-45fc0d68df65
2013-05-29 07:42:56,492 | DEBUG | LocalTaskScheduler-Pool-31 | TaskServiceImpl | Recorded completion of task 'NETWORK_GATEWAY_REDEPLOY(com.vmware.vcloud.entity.task:b6261962-0d14-48b0-836b-45fc0d68df65)' (retry count: 1) | vcd=83908311-0f60-48e3-a2ec-f10f07c4f187,task=b6261962-0d14-48b0-836b-45fc0d68df65
2013-05-29 07:42:56,494 | INFO | LocalTaskScheduler-Pool-31 | LocalTask | completed executing local task NETWORK_GATEWAY_REDEPLOY(com.vmware.vcloud.entity.task:b6261962-0d14-48b0-836b-45fc0d68df65) |

What I quickly realized is that it also affected the ability to modify any existing Edge Gateway IP/NAT/Firewall/VPN settings. If it were just the upgrade that was affected, I probably would have left it for another day.

Through all my searching, I could not find anyone who had a solution that worked for me and most posts ended up saying “call VMware support”. Well, I’m a glutton for punishment and often don’t know when to give up, so I kept at it and I was able to get it working.

I shutdown the new vShield Manager VM and rolled back to the snapshot I took of original vShield Manager VM after the vCloud Director upgrade but before the vShield upgrade. I then started to go through the steps again in this VMware KB: Upgrading to vCloud Networking and Security 5.1.2a best practices guide with a few deviations.

Even though I had enough space to run the main upgrade bundle, I ran the space clearing VMware-vShield-Manager-upgrade-bundle-maintenance-5.0-939118.tar.gz bundle anyway. After that finished, I ran the main 5.1.2 upgrade bundle (VMware-vShield-Manager-upgrade-bundle-5.1.2-943471.tar.gz).

Before I did the backup, deploy new OVF, restore, maintenance bundle upgrade routine in the KB, I went through and did an upgrade of each edge gateway (under the Edges dropdown in the vShield Manager web UI) which worked! In essence, this is a simple re-deploy of a new OVF of the gateway and reconfiguration of the service template with the latest version from the new vShield Manager.

Then I installed the VMware-vShield-Manager-upgrade-bundle-maintenance-5.1.2-997359.tar.gz bundle. After that was all booted back up and stable, I stopped vCloud Director, took a backup of vSM, deployed the new vSM OVF, installed the VMware-vShield-Manager-upgrade-bundle-maintenance-5.1.2-997359.tar.gz bundle to the new install, restored the backup, re-registered vSM with vCenter, started vCD, re-registered vCD with vSM.

Hope this helps someone out.

Upgrading to vCloud Director 5.1 with Existing Nested ESXi VMs

While my upgrade from vCloud Director 1.5.1 to 5.1 went on through out the day, I started to have a sinking feeling that I wasn’t going to be able to complete it with zero downtime for all of the VMs in the environment.

In our environment, a lot of training and product demos happen, and much of that relies on utilizing nested ESXi, similar to how VMware’s Hands On Labs are run at VMworld (and thankfully, now available online outside of the event).

William Lam has a great article on modifying your vCloud Director database to automatically pass the ‘nested hypervisor’ support flag to vCloud hosts as they’re brought into vCD to be used as a resource rather than having to modify each vSphere hosts’s config file.  http://www.virtuallyghetto.com/2011/10/missing-piece-in-creating-your-own.html

However, with vSphere 5.1, VMware changed how nested ESXi is enabled. It’s now on a per VM basis rather than a per host basis. William’s post “How to Enable Nested ESXi & Other Hypervisors in vSphere 5.1” covers the changes and the new process quite well, so I won’t cover that here.

The biggest kicker to this is that it requires the VM being VMware Hardware Version 9 which is new to vSphere 5.1. So, any current nested ESXi (or any other nested hypervisor) is running, at highest, Hardware Version 8. Continue reading

Change Virtual Machine SCSI Controller Type in NexentaStor VSA

Before I say anything, I shouldn’t need to say this, but I will. This is not supported. Now, on to the fun!

The current release of NexentaStor (v3.1.4.1) is made available as an OVA to make it easy to import into VMware environments. Currently this only “works” in full blown vSphere hosts and not Fusion/Workstation/Player (“works”, because with some fenagling, you can get it running in Fusion – don’t have access to Workstation/Player at the moment). Ok, already getting off track. This OVA comes with the following hardware configuration:

1 vCPU
2GB RAM
1 x 8GB Hard Drive (syspool)
– this is configured with a VMware Paravirtual controller
1 Virtual Nic
– this is configured as a VMXNET3 device
Continue reading

IP Multipathing Setup In Nexenta

A feature added in NexentaStor 3.1.4 is the ability to configure IP Multipathing (IPMP) groups via the management console (NMV) rather than having to drop to the shell and configure it manually.

IPMP has two purposes: fault-tolerance and outbound traffic load spreading. While there’s a lot of overlap between Link Aggregation and IPMP, there are some key differences. For more on that, you can read Nicolas Droux’s great write up:
https://blogs.oracle.com/droux/entry/link_aggregation_vs_ip_multipathing.

By default, NMV created IPMP groups with link based failure detection rather than probe based. Link based detection is lighter than probe based as it relies on the lower level detection link state rather than a test IP address.

Continue reading

Nexenta VAAI-NAS Beta Released, NFS Hardware Acceleration

Skip to Update 1

Along with the release of NexentaStor 3.1.4, Nexenta Systems today officially released the (very) Beta VAAI-NAS plugin for VMware vSphere 5.x via the community NexentaStor.org forums. VAAI-NAS is still not widely supported in the NAS world, and of those that do, not all support all the primitives.  You can search the VMware Compatibility Guide for vendors that are VAAI-NAS certified.

VAAI, to catch up, is the the suite of primitives (instructions) that allow vSphere to offload certain VM operations to the array. For NAS Hardware Acceleration, these are:

  • Full File Clone – Enables virtual disks to be cloned by the NAS device (but not ‘hot’, the VM must be powered off).
  • Native Snapshot Support – Allows creation of virtual machine snapshots to be offloaded to the array.
  • Extended Statistics – Shows actual space usage on NAS datastores (great for thin provisioning).
  • Reserve Space – Enables creation of thick virtual disk files on NAS.

Everything you wanted to know about VAAI (but were afraid to ask)
http://www.vmware.com/files/pdf/techpaper/VMware-vSphere-Storage-API-Array-Integration.pdf

At this point, all primitives are working (or supposed to, it’s beta, right?) save for the Native Snapshots.

Here’s a quick tutorial to install the agent in NexentaStor and the plugin in VMware Vsphere.

Continue reading

PowerCLI Mass Add Hard Disks to Virtual Machine

While doing some iSCSI LUN testing for a certain storage vendor, I was looking for a way to add multiple hard disks to a single VM across each iSCSI LUN whose name matched a certain pattern. In my case, all luns I was testing against had the full lun path in their name so the were similar to lun1.naa.600144f0dcb8480000005142553e0001 (thanks to Alan Renouf’s post “PowerCLI: Mass provision datastore’s” for guidance on  scripting datastore creation).

However, I do not have all luns mapped to every vSphere host. Easy enough to get around this in PowerCLI. The following script prompts for the Virtual Machine name, size and hard disk format. Then filters the datastores by that VM’s vSphere host and our common string in the datastore name.


$vmname = read-host "VM Name to add disks to"

$vm = get-vm $vmname

$size = read-host "Disk Size (GB)"

$format = read-host "Disk Format (thin, thick, EagerZeroedThick)"

$datastores = $vm | Get-VMHost | Get-Datastore | Where-Object {$_.name -like "lun*naa*"}

foreach ($item in $datastores){
$datastore = $item.name
write-host "Adding new $size VMDK to $vm on datastore $datastore"
New-HardDisk -vm $vm -CapacityGB $size -Datastore $datastore -StorageFormat $format
}

There are a lot of parameters for the New-HardDisk cmdlet that I don’t specify because the defaults were what I already wanted (e.g. Persistence, Controller, DiskType, etc.). Some, like StorageFormat which defaults to Thick Lazy Zeroed, I wanted to control.

In another case, I wanted to add multiple disks from one datastore to a vm.


### Get VM/Disk Count/Datastore information ###
$vmname = read-host "VM Name to add disks to"
$num_disks = read-host "number of disks to add"
$ds = read-host "Datastore to place the VMDK"
$format = read-host "Disk Format (thin, thick, EagerZeroedThick)"
$size = read-host "Disk Size (GB)"

$vm = get-vm $vmname
$datastore = get-datastore -name $ds
$x=0

### Add $num_disks to VM
while ($x -lt $num_disks){
write-host "Adding $size VMDK to $vm on datastore $datastore"
New-HardDisk -vm $vm -CapacityGB $size -Datastore $datastore -StorageFormat $format
$x++
}

You can read more about the New-HardDisk cmlet at:
http://www.vmware.com/support/developer/PowerCLI/PowerCLI51/html/New-HardDisk.html

VMs Grayed Out (Inaccessible) After NFS Datastore Restored

[Added new workaround]

While working with a customer last week with Mike Letschin, we discovered an issue during one of their storage tests. It wasn’t a test that I’d normally seen done, but what the heck, let’s roll.

“What happens to all the VMs hosted on an NFS datastore when all NFS connectivity is lost for certain period of time?”

Well, turns out, it depends on a couple things. Was the VM powered on? How long was the NFS datastore unavailable for?

Interop 2012 – HP Networking Innovations on Display

This is post is a long time coming, but starting a new job will do that :)

I’ve long wanted to attend the Interop tech conference and was able to attend the Las Vegas installation this year by invite from HP/Ivy Worldwide. I was really hoping to make the Mumbai show, but I guess it wasn’t in the budget. It’s primarily a networking focused conference, but with datacenter virtualization technologies converging (you may have heard the term Converged Infrastructure), virtualization admins cannot afford to pass off more complex networking infrastructures to a ‘networking guy’.

Continue reading

HP Gen8 – smarter hardware

One of greatest things about the evolution of the server industry is the (attempt at) engineering out mistakes. You can only go so far to remove the human element in systems administration, but HP’s doing a really good job with the latest release of their Gen8 server line.

The three features I like the most are the CPU Smart Socket Guide, the Do Not Remove light, and the iPDU system (Intelligent power distribution unit – iPDU).

The CPU Smart Socket guide was co-developed with Intel to remove the too common mishap of bending pins on the motherboard when installing a CPU. Here’s a picture of the CPU in the cradle.

Img_20120507_101532

The Do Not Remove light comes into effect when disks in a RAID set fail and removing the wrong drive from the server (as you can have many RAID sets) will result in data loss. You can see the indicator in all of it’s glorious action below.

The iPDU kit works best as a combination of 3 pieces – the special power supply in the server, the special iPDU and Insight Control. The whole system working together accurately measures power utilization, maps servers (Gen8 w/ Platinum power supplies) to PDU ports and verifies redundancy. No more outages because you accidentally plugged both power supplies into the same PDU.

Removing human error from the datacenter, especially the large datacenter, will cut down on outages, data loss, unnecessary parts replacements and hair loss. Well, maybe not hair loss (but we can hope).

The old saying is that ‘you can’t fix stupid’. Well, HP hasn’t exactly done that, but they’ve certainly put up bigger warning signs so you’d have to be stupid on purpose.

 

HP Tech Day – Gen8 Blogger Event

HP and Intel are hosting bloggers today at their Houston campus to do a deep dive on their new Gen8 server platform. You can watch the live stream at http://www.hp.com/go/gen8bloggers and follow along on twitter with the tags #HPTechday and #Gen8.

The blogroll is:

Frank Owen III
http://www.techvirtuoso.com
@fowen

Michael Letschin
http://www.thesolutionsarchitect.net
@mletschin

Bob Stein
http://www.activewin.com
@ActiveWin

Hector Russo
http://www.geeksroom.com
@geeksroom

Phillip Jaenke
http://rootwyrm.us.to
@rootwyrm

Jeffrey Powers
http://www.geekazine.com
@geekazine

Scott Lowe
http://www.virtualizationadmin.com/Scott_D_Lowe
@otherscottlowe

Hans Vredevoort
http://www.hyper-v.nu
@hvredevoort

Brian Knudtson
http://www.knudt.net/vblog
@bknudtson