“…simple truth is that you cannot troubleshoot and do the administrative stuff simultaneously. You’re good, but no one is that good.”
It’s 11:30 and you’ve dozed off in front of the TV on a Saturday night when your phone goes off – another callout.
It’s IT Operations on the horn and there’s a problem with one of your Production systems.
You think, “Oh, God, not again.”
You hook up the laptop, call in to the bridge, VPN to the network, and start sniffing around.
Definitely, the system is down and it doesn’t look like a quick fix will bring it back. This is going to be an all-nighter.
First things first – depending on your company or client, there is a notification protocol and you quickly assess you are going to need other resources to help troubleshoot and then fix the system.
The simple truth is that you cannot troubleshoot and do the administrative stuff simultaneously. You’re good, but no one is that good. You need help with the administrative aspects of managing an operations incident. You need a wingman.
Do not be bashful – engage the Operations person to help with the administrative components. Get them to take notes. Have them page out other people. If that is not “in their contract” get another member of your team to help you. If no one else is available, then get your Service Delivery Manager out of bed. This is an outage and whatever it takes is what is needed – all hands on deck!
As you work the issue, have your wingman send out notification emails. Ask them to call the business representatives if the outage will affect end users. Have your wingman look at things you cannot look at while you are in the thick of troubleshooting – logs on other servers; logs and reports on monitoring systems (SCOM, SPLUNK, Open View, Idera, etc.). If you need a break or your wingman has stronger skills investigating the issue at hand, trade roles.
Don’t be a hero. If you’ve been going at hit for an hour or two and haven’t gotten the system back up, take a break and let your wingman take the lead. There is no “I” in team.
The Boy Scout motto “Be Prepared!” is a good for IT problem solvers.
Be prepared for issues to occur at the least opportune of times and under the least desirable of circumstances.
If your contact lists are in SharePoint and SharePoint is down, you are in a heap of trouble, son.
Keep your information in SharePoint, sure, but also make hard copies of lists and keep them with you. Put copies in the glove compartment of the car. Put soft copies in a secure place (OneDrive, for example) so you can get to them while traveling or enjoying an evening out.