After upgrading from vCloud Director 1.5.1 to 5.1.2, vShield Manager 5.0.1 to 5.1.2 and vSphere 5.0 to 5.1.0 following all of the Best Practices KBs for each, the time came to upgrade off the vShield Edge Gateways to take advantage of some of the advanced capabilities and performance. When I attempted this via vCloud Director (right-click Edge Gateway and choose ‘Re-deploy’), I was met with this error message:
Cannot redeploy edge gateway BizDev External Network (urn:uuid:f1e69daa-7b56-4e8b-8713-549cfbe8c9f7) org.springframework.web.client.RestClientException: Redeploy failed: Edge connected to ‘dvportgroup-9622’ failed to upgrade.
Inspecting the vCloud Director debug logs revealed this:
2013-05-29 07:42:56,316 | DEBUG | nf-activity-pool-192 | LoggingRestTemplate | Created POST request for "https://10.10.10.56:443/api/2.0/networks/dvportgroup-9622/edge/upgrade" |
2013-05-29 07:42:56,316 | DEBUG | nf-activity-pool-192 | LoggingRestTemplate | Request::URI:https://10.10.10.56/api/2.0/networks/dvportgroup-9622/edge/upgrade method:POST |
2013-05-29 07:42:56,316 | DEBUG | nf-activity-pool-192 | LoggingRestTemplate | Request body :<none> |
2013-05-29 07:42:56,406 | WARN | nf-activity-pool-192 | LoggingRestTemplate | POST request for "https://10.10.10.56:443/api/2.0/networks/dvportgroup-9622/edge/upgrade" resulted in 404 (Not Found); invoking error handler |
2013-05-29 07:42:56,406 | ERROR | nf-activity-pool-192 | NetworkSecurityErrorHandler | Response error xml : <?xml version="1.0" encoding="UTF-8" standalone="yes"?><Errors><Error><code>70001</code><description>vShield Edge not installed for given networkID. Cannot proceed with the operation</description></Error></Errors> |
2013-05-29 07:42:56,407 | DEBUG | nf-activity-pool-192 | EdgeManagerSpock | Failed upgrading edge connected to dvportgroup-9622. |
com.vmware.vcloud.fabric.nsm.error.VsmException: vShield Edge not installed for given networkID. Cannot proceed with the operation
at com.vmware.vcloud.fabric.nsm.error.NetworkSecurityErrorHandler.processException(NetworkSecurityErrorHandler.java:95)
at com.vmware.vcloud.fabric.nsm.error.NetworkSecurityErrorHandler.handleError(NetworkSecurityErrorHandler.java:70)
at org.springframework.web.client.RestTemplate.handleResponseError(RestTemplate.java:486)
at org.springframework.web.client.RestTemplate.doExecute(RestTemplate.java:443)
at com.vmware.vcloud.fabric.net.utils.impl.LoggingRestTemplate.doExecute(LoggingRestTemplate.java:64)
at org.springframework.web.client.RestTemplate.execute(RestTemplate.java:401)
at org.springframework.web.client.RestTemplate.postForEntity(RestTemplate.java:302)
at com.vmware.vcloud.fabric.net.utils.impl.RestClient.postForLocation(RestClient.java:108)
at com.vmware.vcloud.fabric.nsm.services.spock.EdgeManagerSpock.redeployEdge(EdgeManagerSpock.java:728)
at com.vmware.vcloud.fabric.net.activities.gateway.DeployGatewayActivity$GenerateBacking.invoke(DeployGatewayActivity.java:347)
at com.vmware.vcloud.fabric.foundation.activity.executors.ActivityRunner.run(ActivityRunner.java:123)
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
2013-05-29 07:42:56,407 | ERROR | nf-activity-pool-192 | DeployGatewayActivity | [Activity Execution] Handle: urn:uuid:f1e69daa-7b56-4e8b-8713-549cfbe8c9f7, Current Phase: com.vmware.vcloud.fabric.net.activities.gateway.DeployGatewayActivity$GenerateBacking, ActivityExecutionState Parameter Names: [BACKING_SPEC, NDC, activitySupervisionRequest, com.vmware.activityEntityRecord.EntityId, REDEPLOY, DEPLOY_PARAMS] - Could not deploy gateway BizDev External Network |
org.springframework.web.client.RestClientException: Redeploy failed: Edge connected to 'dvportgroup-9622' failed to upgrade.
at
-- snip --
2013-05-29 07:42:56,437 | DEBUG | LocalTaskScheduler-Pool-31 | JobString | Job object - Object : BizDev External Network(com.vmware.vcloud.entity.gateway:d21b172b-b926-46e7-8e8b-07fb71843b18) operation name: NETWORK_GATEWAY_REDEPLOY | vcd=83908311-0f60-48e3-a2ec-f10f07c4f187,task=b6261962-0d14-48b0-836b-45fc0d68df65
2013-05-29 07:42:56,486 | DEBUG | LocalTaskScheduler-Pool-31 | CJob | No last pending job : [BizDev External Network(com.vmware.vcloud.entity.gateway:d21b172b-b926-46e7-8e8b-07fb71843b18)], status=[3] | vcd=83908311-0f60-48e3-a2ec-f10f07c4f187,task=b6261962-0d14-48b0-836b-45fc0d68df65
2013-05-29 07:42:56,487 | DEBUG | LocalTaskScheduler-Pool-31 | CJob | Update last job : [BizDev External Network(com.vmware.vcloud.entity.gateway:d21b172b-b926-46e7-8e8b-07fb71843b18)], status=[3], [5/29/13 7:42 AM] | vcd=83908311-0f60-48e3-a2ec-f10f07c4f187,task=b6261962-0d14-48b0-836b-45fc0d68df65
2013-05-29 07:42:56,487 | DEBUG | LocalTaskScheduler-Pool-31 | TaskServiceImpl | Cleaning busy entities for task 'b6261962-0d14-48b0-836b-45fc0d68df65' | vcd=83908311-0f60-48e3-a2ec-f10f07c4f187,task=b6261962-0d14-48b0-836b-45fc0d68df65
2013-05-29 07:42:56,488 | DEBUG | LocalTaskScheduler-Pool-31 | BusyObjectServiceImpl | Unsetting 1 busy entitie(s) for task ref NETWORK_GATEWAY_REDEPLOY(com.vmware.vcloud.entity.task:b6261962-0d14-48b0-836b-45fc0d68df65) | vcd=83908311-0f60-48e3-a2ec-f10f07c4f187,task=b6261962-0d14-48b0-836b-45fc0d68df65
2013-05-29 07:42:56,492 | DEBUG | LocalTaskScheduler-Pool-31 | TaskServiceImpl | Recorded completion of task 'NETWORK_GATEWAY_REDEPLOY(com.vmware.vcloud.entity.task:b6261962-0d14-48b0-836b-45fc0d68df65)' (retry count: 1) | vcd=83908311-0f60-48e3-a2ec-f10f07c4f187,task=b6261962-0d14-48b0-836b-45fc0d68df65
2013-05-29 07:42:56,494 | INFO | LocalTaskScheduler-Pool-31 | LocalTask | completed executing local task NETWORK_GATEWAY_REDEPLOY(com.vmware.vcloud.entity.task:b6261962-0d14-48b0-836b-45fc0d68df65) |
What I quickly realized is that it also affected the ability to modify any existing Edge Gateway IP/NAT/Firewall/VPN settings. If it were just the upgrade that was affected, I probably would have left it for another day.
Through all my searching, I could not find anyone who had a solution that worked for me and most posts ended up saying “call VMware support”. Well, I’m a glutton for punishment and often don’t know when to give up, so I kept at it and I was able to get it working.
I shutdown the new vShield Manager VM and rolled back to the snapshot I took of original vShield Manager VM after the vCloud Director upgrade but before the vShield upgrade. I then started to go through the steps again in this VMware KB: Upgrading to vCloud Networking and Security 5.1.2a best practices guide with a few deviations.
Even though I had enough space to run the main upgrade bundle, I ran the space clearing VMware-vShield-Manager-upgrade-bundle-maintenance-5.0-939118.tar.gz bundle anyway. After that finished, I ran the main 5.1.2 upgrade bundle (VMware-vShield-Manager-upgrade-bundle-5.1.2-943471.tar.gz).
Before I did the backup, deploy new OVF, restore, maintenance bundle upgrade routine in the KB, I went through and did an upgrade of each edge gateway (under the Edges dropdown in the vShield Manager web UI) which worked! In essence, this is a simple re-deploy of a new OVF of the gateway and reconfiguration of the service template with the latest version from the new vShield Manager.
Then I installed the VMware-vShield-Manager-upgrade-bundle-maintenance-5.1.2-997359.tar.gz bundle. After that was all booted back up and stable, I stopped vCloud Director, took a backup of vSM, deployed the new vSM OVF, installed the VMware-vShield-Manager-upgrade-bundle-maintenance-5.1.2-997359.tar.gz bundle to the new install, restored the backup, re-registered vSM with vCenter, started vCD, re-registered vCD with vSM.
Hope this helps someone out.