Sometimes you have a day when you think, “Why did I ever get into IT?”. Just when everything seems to be running smoothly an unexpected “blip” happens that seems to have no logical explanation. Then you remember that you sadistically enjoy solving these ambiguous problems and you start to dig.
I had such a problem this morning when I get an email from our overseas office telling me that they can’t log in. No problem, I thought, just a simple password reset and all should be fine. I fired up remote desktop to get to the Domain Controller but it wouldn’t let me log in under my admin account. Suddenly my server-spider-sense started to tingle. I knew this wasn’t going to be a quick fix. Oh, and I should say the DC is a virtual machine hosted in Microsoft’s Hyper-V (2008 R2 edition).
read more after the jump
Luckily, this was a child domain, so I could log using my account from the parent domain. A quick look at the events in Server Manager showed a whole page of yellow triangles (bad) and red circles (worse). The gist of the problem was that the DC couldn’t see the global catalog (sic) server. To me, that would point at a network/DNS issue. This confused me slightly as I could access the server remotely without a problem. I could ping it from various servers but I couldn’t see out the other way. I launched a command prompt and started of with a simple IPconfig command. To this i was given the answer;
Unable to contact IP driver. General failure
A quick Google revealed article A and article B(among others) saying this seemed to point to an issue with duplicate Security IDs (SIDs) on the network. This was also confusing as Microsoft have often stated that, despite releasing tools like SysInternals NewSID, it doesn’t actually cause any problems if a computer has the same SID as another. Mark Russinovich debunks the duplicate SID myth here but I also had first hand experience of runnning 500 computers with identical SIDs to no ill affect (back in my School Technician days). I was pretty sure my VMs didn’t have duplicate SIDs as I used a rather wonderful WDS/MDT combo to install them properly. Even so, I got hold of PsGetSID and scanned the network. It didn’t pick up a single duplicate so I was back to the drawing board.
I wish I could categorically say what I did to fix this but it was one of those tasks when time is against you and you just need to fix it ASAP! So I present to you everything I did in the hopes that, if you are suffering similar problems, you can pluck out a juicy nugget.
The server is named DC in the overseas.company.lan domain
It is a Windows Server 2008 R2 Enterprise guest VM on a Windows Server 2008 R2 Enterprise Hyper-V host.
- My first step was to remove the virtual Network adapter from the DC.
- This requires shutting down the VM and going in to the settings via Hyper-V Manager.
- Simply remove the network adapter and click Apply
- You can then go to the Add Hardware section and put a new one back in
- WARNING: if you have set up DHCP with a reservation for the server, bear in mind it will get a new MAC address. So set up a new reservation or configure the VM with a static MAC matching the old one
- Run a few checks in the command prompt
- DCDiag is you friend. Use the /q switch for it to only report on the errors
- My DCDiag came up with a load of Secure Channel and Replication errors (unsuprisingly)
- You can use NTDSutil to trouble shoot secure channel errors
- If you need a refresher then have a look at this great NTDSUtil tutorial
- Next, I went through the DNS Server on the DC to check that it had all the correct settings
- The Name Servers (on the zone properties) only had the host names of the parent DNS Servers. I put in their IP addresses and got a couple of nice shiny green ticks
- I also made sure Zone Transfers were allowed to listed Name Servers
- Finally I threw in a Secondary Zone copy of the parent domain (company.lan) for good measure
- Turn it on and off again
- Do not underestimate the power of a reboot! The server isn’t working anyway so there is no harm doing this.
- You could just restart the relevant services but a full reboot makes sure you don’t miss on.
- I rebooted after each of the 3 steps above and finally I could log in properly. Whether this was due to the final reboot or some time-based replication in the background I’m not sure.
Hopefully this will help others he get stuck with a similar issue. Now it’s time to dig through the event logs to see if I can find out what caused it in the first place!