vCF 3.x – SDDC Manager fails to poll or fetch info within the webUI

Intro

During the different upgrades of vCF, one issues came several times back within the SDDC Manager. Which is that the SDDC Manager is unable to show any kind of information or data within the web UI.

General Reason for failing

So there can be several different reasons and scenario’s why this is happening. However, the general bottom line is that not all the services within the SDDC Manager are started and / or the Postgress Database is not running. Most of the time the service for this is the LCM service or Operations Manager service. Since the database and services are closely connected and often depended, one cannot work without the other. So when the database isn’t running some of the services also won’t start. When that is the case, the UI just simply doesn’t show you the information, and you probably can’t even do anything within the SDDC Manager.

Overview of root causes I encountered

Like I said, the bottom line is that one of the SDDC Manager services has not started or that the Postgress Database isn’t running. The following scenario’s can cause this to happen.

  1. You just rebooted the SDDC Manager, and the services are still starting up.
    • Have patient 😉
  2. Caching Issue with your webbrowser.
    • Try google incognito or clear your cache.
  3. After an upgrade, the Operations Manager service isn’t running
    • Possible Database is also not running
  4. After an upgrade, the LCM service isn’t running
    • Possible Database is also not running
  5. After an upgrade, the Postgress Database couldn’t start up because of a blocking value or id which was created during the upgrade.
  6. After a reboot, a corrupted value or line blocks the postgress database to boot up.
    • This value could have arised while the vm was already running. It is then possible that the value or line didn’t block the database or gave any problems since it was already running, and only becomes apparent and blocking after a reboot.

How to Troubleshoot this issue

So there are a few ways to troubleshoot this issue.
Lets sum the up.

1. Open up a console to and check which services aren’t starting during the reboot.
This quickest and easy way to pinpoint the root cause. During the reboot, the SDDC Manager clearly shows which services are starting, and which not. It even shows the command you can issue to troubleshoot it.

2. Check the logs within the SDDC Manager
So the best place to troubleshoot is of course the logs. There are multiple logs for different services, which can be found under:

/var/log/vmware/vcf

The logfile I check most of the time first is the lcm-debug log. Which is in the “/var/log/vmware/vcf/lcm” folder.The lcm-debug.log shows most of the time the best information to troubleshoot the issue. Especially during upgrades this log is very helpful.

3. Check the “Known Issues” section within the vCF Release Notes

Definitely also one of the best places to check. There are some issues that are known for during some of the upgrades or versions of the SDDC Manager. So definitely check that out.

It is by the way a really good habbit to check those before you start with the upgrades. You don’t have to read every single thing, but to skim the “Know & Solved Issues” before you begin, can save you a lot of headache moments.

4. Revert the snapshot and troubleshoot the before and after state

This is only applicable if you have issues after an upgrade, but there are several things you can do by reverting back.
– First, it is good to check if this issue happens again after you restart the upgrade. With a little luck, it just comes back without any problems. If not, then you at least know that the issue arise after the upgrade.

  • Another good thing to do, is to check if the issue also happens after a reboot before you upgraded. This way you know if there is something wrong even before the upgrade.
  • Compare the lcm-debug.log and the booting up process before and after the upgrade. This way, it is easier to find the indifferences.

Some Solutions that solved this issue for me

There are several ways that I solved this issue, both simple and complex.

Caching Issue

First always clear the cache in your web browser, and or use a different web browser or change to incognito mode. Just to be sure that it is not a caching issue.

Rebooting the Server

Sometimes lady luck is around the corner, and a simple reboot does the trick. Did you try turning it off and on again?

Starting or restarting the services through the cli

Sometimes a dependency was not ready while the service started. Restarting that same service may find the dependency and let is start successfully. It also sometimes helped me to find the culprit when I went back and view the results in the logs.

Trying to unlock the database by deleting the persistent state

Sometimes a database can have a lock that prevents it from starting up. This can sometimes be solved by removing the persistent state.

psql --host=localhost -U postgres -d sddc_manager_ui


Fill in your password.

Then type:

delete from persistent_state;'

Quite the connection to the database with:

\q

For more info, check one of my other blogposts. It is not about this topic, but it will talk about how you can delete the persistent state more.
https://www.vsam.pro/vcf-sddc-manager-upgrade-fails-at-sddc-manager-ui-component 

Removing a password with too many characters that blocked the database

This was a specific issue that I had, which may be written more about in another blogpost. However, the bottom-line was that one of the passwords of the managed components within the SDDC Manager was too long. In our case we had a NSX node that accidently had its password copy pasted double, within the SDDC Manager password database. The password for all the components, were originally 12 characters. However, that one NSX node, had its password pasted double and thus ended up with 24 characters. Now you wouldn’t think that this could result in any issues. However, later on we found out that the password with 24 characters, exceeded the 255 characters once it was encrypted. This resulted that the password was bigger then the table size in the database, since this is standard 255 within the postgress database. Changing the password or extending the table size, solved the issue. This problem apeared after an upgrade. So check with the lookup_password command within the SDDC Manager if there are any passwords that have a high amount of characters. Since that day on, I never use a password for VMware that is higher than 20 characters 😆.

Clearing duplicate or blocking values & IDs within the Postgress database

Just like the password with 24 characters that blocked the database, duplicate values or id’s can also block the database. Removing those can solve the issue. However, once you reach this point, it definitely do not recommend to just try out and fiddle with it if you are using vCF in a production environment. You should best involve GSS, especially if you don’t know what you are doing.

I wrote a blogpost about it, since I had this problem in a more specific issue during a upgrade.
Check it here.

Removing IDs within the transaction logs

Sometimes the upgrade creates weird or random values during the upgrade. Which are most of the time saved in the transaction log. Clearing those ids one by one, can help the database to start up again. However, just like the previous solution, once you reach this point, it may be a good thing to involve GSS if you don’t know what you are doing.

Alright hope that this post may have some value for you.
Let me know if this helped you in any kind of way.

 

↑↑ Follow me on my Socialz ↑↑ - Or - ↓↓ Care & Share ↓↓

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.