NEI Post 011 – Toxic Troubleshooting

I want to address the human element of troubleshooting a bit. As a network engineer or any other type of IT contributor you will find yourself in troubleshooting scenarios involving people other than yourself. You may collaborate with other members of your team to solve problems whether you’re seeking help or receiving it. You also may coordinate with people outside of your team such as adjacent technical groups within your own IT organization or perhaps with IT staff from a customer or vendor. When there are multiple contributors to a troubleshooting session there is a key factor that determines how efficiently you travel toward the resolution you seek, and that is the flow of information!

I have found that the ease with which information is exchanged during troubleshooting is directly proportional to the speed at which problems are solved. If you are on a solo mission to solve a problem you are often seeking key bits of information to either lead you in the direction of the solution or to confirm a hypothesis. The effort required in pursuit of these key data points depends on the complexity of the problem and the scarcity of the needed information. The most common starting point for network troubleshooting is obtaining the source and destination IP address. I can’t tell you how many times I’ve listened to a long detailed explanation of some application layer issue that boils down to host A being unable to communicate in some fashion to host B, and having to utter that ‘oh so common phrase, “ok, what’s the source and IP address?” Fishing for that basic information is a never-ending battle that I won’t dwell on here, but the premise still applies that key bits of information are required to solve most issues with some bits being more basic than others.

While obtaining the source and destination IP is a common starting place for network investigations, complex IT issues often require many more data points to come to a resolution. This is where the human element comes in on a collaborative troubleshooting session. If you’re on a call or bridge to assist with incident response or problem resolution you are most likely there as a technical contributor or perhaps subject matter expert for some slice of the IT stack in play. The first instinct when joining the investigation may be to listen to the problem summary to determine whether the fault lies within your domain of expertise or responsibility. That’s perfectly logical since you may need to take action to resolve the issue at hand. What happens, if based on the information provided you don’t think your slice of the IT stack is responsible? Your first option is to say nothing and let the others figure out the issue since it can’t be in your domain. This path can be ok but only after you present your thoughts on why you’re not responsible. When you present evidence as to why you think the problem is not X then you may help guide the other contributors down the correct path. Often, crossing potential causes of a problem off of the list of possibilities will help focus the investigation elsewhere. A second option you have is to provide as many data points as possible within your domain of responsibility to equip the other contributors with the information they may not know they need. Often a team can divide and conquer their way to a dead end and access to your findings may foster creativity in seeking that resolution. If not already obvious this is the better of the two options and the method I encourage.

One path I encourage you not to take is that of the toxic troubleshooter. I’m not sure what the opposite of a flow state is, but that is usually what the toxic troubleshooter brings to the session. A hindrance of information flow that usually starts with the utterance of, “it’s not a <insert department> issue.” It’s certainly ok to demonstrate why it’s not a DNS issue, for example, if you have the evidence but saying it’s not a DNS issue shouldn’t be your entire contribution. I’ve run into scenarios where something similar was uttered with no supporting evidence just some loose anecdotal theory based on limited information. While this statement may be true there are many occasions where the symptoms of a problem point the investigation in one direction when the problem is elsewhere. Getting to the root cause can be a few data points away but the path there is long and arduous because it’s not immediately evident what data is needed to solve the problem. That is why it’s helpful to offer up as much information as possible to help push the investigation along. Generally, the sooner a problem is fixed the sooner everyone can disengage and go about their day so why not help where you can? Don’t be the toxic troubleshooter, stay engaged and contribute to the flow of information for the greater good!

Happy Hunting!

~Eric Perkins

Leave a comment Cancel reply