Jump to content

Recommended Posts

  • Employee
Posted (edited)

Troubleshooting, an art to master

One thing that always baffled me is how people troubleshoot an issue. I may have unconventional ideas around this topic, it is a truly an art to master. Why am I writing a blog on troubleshooting? Most posts in this forum are about issues that need a fix. In this blog I will explain my view on troubleshooting and share links to Omnissa articles to use. Hoping that you find joy in troubleshooting as I do. Crazy as it sounds but it was one of the things I enjoyed while being a consultant, next to design workshops.

There are, in general, two types of people when we look at this. There are people who look for a (quick) solution to get going, I call them the fast-forward people, and there are people who want to understand the issue, the Need-to-know-why people.

Fast-forward and need-to-know (my analogy to explain it to you) is also known as top-down and bottom-up. It reflects that GSS/Support/Engineering folks focus on logs and work upwards, while people on de the deployment/architecture side work top-down.

image.png.8fca3b415621ad5cea7c8fc9d3be8d69.png

Figure 1 - Troubleshooting approaches.

I am in the second group, the need-to-know group. I need to know how things work, why things break, what is causing it and how we can prevent it from happening again. I understand that issues that halt production need a direct fix, but it should not stop IT from diving into the why. I did my fair share of issue resolving, troubleshooting in my 25 years of consultancy. I missed birthday parties, beach vacations and weekends.

Let us get into the art to master, troubleshooting.

Fast-forward solving

The fast-forward method in a nutshell; when something stops working, we will restart it, kick it, restart services, restart computers, unplug cables, whatever it takes, hoping it will magically start working again. Or we create scripts to overcome the issues at hand. If a component expects value x. I will make sure value x is present when the service starts. It does solve the issue of something not working, I must give them that. Under pressure of management demanding for a fix, any fix is a fix. It will not get the price for the most beautiful solution, but we are moving again.

The root cause is not going away.

Something is simmering in the background, and we applied band aid on the issue so it will keep running. That simmering fire in the background may grow, my flare up again. The band aid may not be big enough this time. Nothing happens out of nothing, nothing.

Something causes the service to stop, the computer to fail, the connection to drop, and all you did is create a workaround. Anytime soon it could happen again, from a different angle, with more impact.

If you do not understand why it happened, the issue is not resolved. It is like replacing a power cable when it catches fire due to overheating but not taking away devices that led to the overheating. Not fixing the root cause is like blocking a river and just hoping water does not just flow around it. It will come back to haunt you, and the next issue could be more damaging to production. The image of Krka in Croatia is telling how water does not care about obstacles. I often think of fixes like a beaver dam, they can block rivers but the water behind them is not going away, it is blocked and will cause an issue elsewhere.

image.png.c000127f708013bd9aa5c4cc3c7a4f71.png

Figure 2 - Water will find a way to flow. - Photo by author.

Root cause analysis is crucial in monitoring and troubleshooting, sadly it takes time to set up and understand.

The whiteboard is your friend getting this art degree.

Root cause analysis is something to prepare for. Do not put the sprinklers up when the building is on fire, do that before it catches fire. The same goes for network/VDI monitoring, map out every connection/dependency of your network while there is no blocking issue. Below is a lightboard example of a diagram mapping.

image.png.b6dc3affba209730bb9201b3fa501ab9.png

Figure 3 - Lightboard diagram, photo by author.

Components, services, computers, network devices, and endpoints, they talk to one other. They receive information, they acknowledge or request information. It gets more complicated; every component depends on other components. Often without a direct connection. Think how DNS and AD play a vital role to make sure any components work, without directly requesting or receiving anything from them. Without DNS your network is an island. With AD you are all standing at the gates unable to log in.

Map every component and the dependency, who talks to who, who listen to who, and who is depending on which other component to work? That will give you a spider web of components and connections. If you can add the data received or sent as well, you will be a root cause expert and soon receive your art degree. I included a diagram with courtesy of eG Innovations, it shows the complexity of a network. Sure, this one is scary, but it shows how complex it can become. It also shows why finding the root cause is difficult. Without mapping the whole diagram how do you know that one server is the reason the frontend is not working properly?

image.png.f311e393740595279bf3fdd8b39fa3a6.png

Figure 5- Courtesy eG Innovation, network topology view.

If you encounter an issue, you can now check sections/connections of the spider web. (Examples below are focusing on a Horizon environment and not the diagram above)

  • Can the client reach the gateway?
  • Can the gateway reach the connection server?
  • Is my desktop getting an IP address?
  • Is the virtual desktop getting a Horizon license?
  • Is the virtual desktop getting an RDSCAL?

We call them baby steps, break the whole connection down to smaller sections and check each individual section. This often involves other experts (hypervisor, DNS, SQL) but as a team you can help each other. On itself, every component looks like a well-behaved child, put them all in a room and you will see the change in behavior.

Once you break down the complex diagram you notice it is not so complex anymore. It becomes smaller understandable pieces of communications, and one by one you build that out. You will find something out of order, data not received, data not as expected, or a route that just goes into the woods. That will guide you to another flow or a component where someone with specific expertise can help you. Together you can solve a root cause, alone you are fixing an issue.

Resources to help you out.

Here are resources that may be helpful in troubleshooting once you identified the culprit.

·         https://techzone.omnissa.com/resource/understand-and-troubleshoot-horizon-connections

·         https://kb.omnissa.com/s/article/89455 - How to read Blast Extreme logs and determine packet loss.

·         https://kb.omnissa.com/s/article/90139 - Troubleshooting Display Issues with the Horizon Blast Protocol - Black or grey screens on connect.

·         https://kb.omnissa.com/s/article/90243 - Guidelines when Troubleshooting Horizon Blast Protocol Performance Concerns

·         https://kb.omnissa.com/s/article/83088 - Unified Access Gateway (UAG): Troubleshooting Intermittent Blast Connection Issues

·         https://kb.omnissa.com/s/article/87457 - Horizon Blast Disconnect Codes

·         https://kb.omnissa.com/s/article/91181 - Horizon Client Blast Error Troubleshooting: VDPCONNECT_PEER_ERROR

·         https://www.stephenwagner.com/2020/04/04/vmware-horizon-blank-black-screen/

Enjoy the mapping and understanding of your environment, and any issue we all can help you with, do not hesitate to ask for help in this forum.  In future blogs we will deep-dive into specific areas of Omnissa products, and how to troubleshoot.

 

Edited by Rob Beekmans
  • Like 5
  • Insightful 2
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...