vCF 3.x – vRSLCM 2.x upgrade fails after deployment

Intro

At one of my customers, we had a problem that the vRealize Suite Lifecycle Manager (vRSLCM) didn’t want to upgrade. We already faced quite some problems with the vCF upgrades, so this one was a nice extra present. This happened when we were upgrading from vCF 3.0.1.1 towards 3.5. In this upgrade we had to upgrade the vRSLCM from version 1.2 to 2.0. The same problem came back when we upgraded the BOM from vCF 3.5.1 towards 3.7. In that version it upgrades the vRSLCM from version 2.0 to 2.0 patch 2. In both upgrades the newly deployed vm didn’t contain any of the settings of the original vRSLCM. So two times in a row, the vRSLCM upgrade failed to upgrade from 1.2. to 2.0, as well from 2.0 to 2.0 patch 2. This was every time that the SDDC Manager deployed the new vm. So I thought it was worth to document it.

Problem Recap

So what seems to be the problem. In order for vCF to upgrade the vRSLCM, the SDDC-Manager deploys a new vRSLCM. It first turns off the original vRSLCM, renames the display name of the original “vrealize-lcm-v01” into”vrealize-lcm-v01-backup” and then deploys a new vRSLCM. Both the jump from 1.2 to 2.0 as well the patch for 2.0, uses this procedure.

Now the problem comes during the deployment. The SDDC-Manager is in this case is unable to import the settings into the new vRSLCM. So you’ll end up with a new empty vRSLCM and a shutdown backup. According to VMware support and some documentation, the information of the vRSLCM is maintained within the SDDC-Manager. It therefor doesn’t need an online (old) vRSLCM in order to export and import the information towards the new vm. So the process in this case is not the same as a vCenter upgrade where you need an temp IP adres. So that was not the problem within the process.

The real problem was that the VM is deployed with an expired root password, and that the SDDC-Manager can also not establish a SSH connection. We found this out when we went through the logs of the SDDC-Manager and saw several lines stating “Could not SSH to vRSLCM!” or “Could not connect to the VM for configuration”. Also when trying to make an SSH connection with putty or by command from the SDDC-Manager, we were not be able to establish a connection.

Because of this, the SDDC-Manager is unable to import the data towards the new vRSLCM during the deployment. In our case, only the IP address settings, FQDN and display name were configured and the rest was blank.

Solution Recap

To solve this, we need to prepare the newly deployed vRSLCM with a working standard root password, and make sure that the SDDC-Manager can establish a SSH connection to the new vm. Which is difficult since it doesn’t trust the new deployed vRSLCM. We therefor need to open up the SSH Connection by editing the SSH Config file within the new vRSLCM. In some situation we also need to change the SSH Config of the SDDC Manager, since the new deployed vm has a different key then it is used to and thus won’t accept to make a SSH connection.Once we have prepared the vm, we will change its name, restart the upgrade process, let the SDDC-Manager deploy another new vRSLCM vm, and once that one is booted, we swap the vm with our own prepared vm. We do this by quickly turning off the new deployed vm and turning on the prepared one.

Lets begin.

Step by Step solution

When you login to the new deployed vRSLCM, it probably fails to login with the original password of the customer.

Since the SDDC Manager only deployed the vm with some basic config, we can login with the default credentials when deploying a vRSLCM, Which is:
Username: admin@localhost (it is litteraly @localhost)
Password: vmware

Then it will prompt the following screen.

Fill in a password, to continue. We’re going to revert this back later to the password “vmware” to finish the upgrade.
After that you will see this screen.

So almost nothing is exported it seems. No worries, this “as designed”.

When you want to change the password in this screen back to “vmware“. You’ll see that it is not possible since it doesn’t follow the password criteria. So we going to the console instead (from vSphere).

Open up the console and login with the new password

Change the password back to “vmware”

So this solves the first problem. We then however, need to establish a SSH connection between the SDDC-Manager and the new vRSLCM. If we open up a putty you will probably get this screen.


So let solve that.

In the console go to: /etc/ssh/
and open up the sshd_config with the vi editor

Remove the line “AllowGroups wheel
You can edit the file when pressing “I”. This let you insert or change text.

Exit editing mode with ctrl-c and save / write the change with “:w
Quit the vi editor with “:q”.
Some extra linux command for those who are unfamiliar with it 😉.

This is supposed to be enough, however I made a few extra changes since it didn’t seem to completely help. So the other changes that I also made into the sshd_config was:
– Changing the line “PermitRootLogin” towards yes and remove the #.
– Change “PubkeyAuthentication” to no

Now restart the sshd service with the command:

systemctl restart sshd

If correct you should be able to open up an ssh connection with putty.
Now we are almost there, but there is still one check that you need to do.

Open up a new ssh connection (with putty for instance) towards the SDDC-Manager. Login with vcf and change it towards root. Then try to establish a SSH connection towards the new vRSLCM. Most probably it still doesn’t establish it because of the SSH key that has changed.

To get past that. Change the known host file. Make a snapshot of the SDDC-Manager before you do this, in case you mess something up. You can open it up with:

vi /root/.ssh/known_hosts

Then place a “#” before the line that has the IP address and/or FQDN of the vrealize LCM instance.
Then save the file, and you shouldn’t receive that screen anymore. Don’t forget to change that line back once the upgrade is completed.

Restart the upgrade

So there is not really an easy way to restart/resume the upgrade and let it continue with your prepared vm.
So what I did as a workaround, was turning of the new and prepared vRSLCM, rename it from “vrealize-lcm-v01” to “vrealize-lcm-v01-prep”. Then I renamed the backup vRSLCM to vrealize-lcm-v01 again . The SDDC-Manager checks the name it has in the vCenter inventory. I turned the old vRSLCM on again, and then restarted the upgrade from within the SDDC-Manager.

This will go through the same process again.
It wil rename the original vRSLCM towards “vrealize-lcm-v01-backup” and then starts with redeploying a new vm again. You then have to monitor the progress of this newly deployed vm. You will have to wait until the vRealize GUI is available again in the browser, and once that has happened you turn off the newly deployed vm (During the Upgrade!!!), rename it to another name, and then quickly rename the prepared vm that we did in the previous steps towards the deployment name “vrealize-lcm-v01” and turn it on. If done quickly, the SDDC-Manager will pick up the prepared vRSLCM from its retry attempts and should be able to ssh into the prepared vm. This way it can configure it again and eventually it should be successfully upgrade.

Once everything has been continued succesfull, it will automatically delete the old vRSLCM, so don’t worry if you see that happen. After the upgrade, you can login back to the UI with the original password that you had before you started the whole upgrade. This means also that the upgrade has successfully imported everything. Don’t forget to check the SDDC-Manager SSH config file and revert any changes if you have made any.

A little bit clunky, but it does work. Hope it helped you.

Samir

 

↑↑ Follow me on my Socialz ↑↑ - Or - ↓↓ Care & Share ↓↓

2 thoughts on “vCF 3.x – vRSLCM 2.x upgrade fails after deployment

  1. Hi Samir,

    This is really helpful! Thanks!

    Just some quick Linux tips;
    After editing with vi you can save and quit at the same time:
    esc
    :wq (to force write quick add the ! at the end e.g. :wq!)

    To remove a know host from the known_hosts, one can simply fire this command:

    ssh-keygen -R (e.g. ssh-keygen -R vrealize-lcm-v01)

    1. Hey Kabir,

      Glad it could help and thanks for the feedback.
      The first one I knew, but the second command I didn’t.
      However, I didn’t want to remove the host from the known_host file since it will have to use the same key after the upgrade. I think that with that command it literally removes the line. Which will probably create another issue after the upgrade :).

Leave a Reply to Samir of vSAM.pro Cancel reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.