A Little History
I’ve been working for this small company for just over a year and a half. This is my first true system administration job. I’ve had several in the past where I did admin work part time, or on the side of my normal duties, but finally, after years of trying, I managed to land a great position. Most of my work consists of Linux servers, running everything from Ubuntu, to Debian, CentOS, and even a couple Slackware servers. Some are in house, some are in the cloud. Feels great to finally be doing what I love. There was just one problem… Windows. You see, many years ago my company was part of a much larger organization with hundreds of employees. This smaller company spun off and took with them 2 Windows Server 2003 systems. These server once did a lot of things. Managed E-Mail (Exchange), printers, file shares, Active Directory, VPN, and internal DNS/DHCP. Since I started, I’ve been fighting the AD (short for Active Directory) servers trying to keep it running (I should note, that I’m a pure Linux guy, I don’t know Windows hardly at all, and definitely not Windows Server). We also dropped down the number of users who actually used the AD. Most people in this newer, smaller company, all ran their own versions of Linux.
Damn Windows
A short time ago I informed management that our Windows Servers would not get updates for much longer, and we should plan on a migration. After talking to Microsoft and getting clarification on pricing, I found just to upgrade would cost somewhere about $3,000. While this is not much in company money, it also involves a lot of future headache for me having to keep them running, plus I’ve heard horror stories when dealing with the actual upgrade. I took it upon myself to get rid of this crap and just be done with it.
Now don’t get me wrong here. Sure, I’m not proficient with Windows Server, but I wouldn’t get rid of something that worked just fine. I’ve spent a lot of time keeping these servers running. They reboot at random times, half the patches fail to install, and sometimes the services on them will just stop working. The problem here is that these servers do not work! When they do reboot, they take 12 minutes and 34 seconds to boot (quad-core Xeon 3.6Ghz, 8GB RAM, 15K HDs on RAID 5). During that time, DNS stops working and I get complaints that the internet isn’t working, or it is very slow. For those who know what happens when your primary DNS goes down, you know what I’m talking about. I even had times where after a reboot, the services fail to start. Lets just say, these servers are very broken, and I don’t believe that Microsoft is worth keeping around when so few computers even need to be on Active Directory in the first place.
Lets Do This
The first step would be to find another DNS/DHCP solution. Easy, I’ve done this many times before. So I have the new server ready to put in place. I should note that we have a weird network configuration, and as of the writing of this portion, I have no idea if my solution will work. We have several VLANs and DHCP is “helped” through the system to get to the correct server. I’m not going to go into the technical details of how this all works (partly because I don’t fully understand it myself), so you’ll just have to hope I figured it all out.
As far as VPN goes, pptp is bad, weak, and apparently not a good one to use. I opted for using OpenVPN. Turns out of Windows setup is a little more difficult for the non-tech savvy users, but we will manage.
What Am I Doing?
Now, the nightmare begins. Removing machines from the AD. I was given 2 computers when I started at this company. A Windows laptop, and a desktop where I could install any Linux distribution I want. I started the AD removal on my laptop and everything went perfectly. At this point, I figured, how hard could this be. So I removed it from a Windows Server than manages our phone system (no one touches that machine, we are all too afraid). This one wasn’t too bad, but it did kill the backup user. Once I added it back in, all is good. I’m still getting backups. Next came a very old XP machine that hosts QuickBooks. After removing it from the AD, I couldn’t login! OK, boot a recovery disk, wipe the SAM Tables (Microsoft’s password database), reboot, add password, done! Woohoo… well, no. Turns out it had a share for our CFO. Crap. It took me a while, but I finally got that share working and he was able to access QuickBooks. As before, this one became broken on the backup system, but it was an easy fix as all I had to do was change out the user our backup software uses to connect to the shared folder. All is good.
Before going too far, I want to let you all know that I’m writing this as things are happening. At this point I still have a machine in our conference room, 1 Windows 8, 4 more Windows 7 (one of which I’m worried about since it is our CFO’s machine), and a bunch of XP machines in our dev lab. Yep, XP. We have older equipment where the software will not run on anything above XP, so I have to keep them very hidden and spend a little extra time on them to ensure they are OK.
Anyway, I ended up having another issue when doing out conference room PC. It seems, this computer didn’t keep some of the group policy settings that others did. Granted, I was able to just change the backup user, but I actually had to create another share. What a pain right? To make matters worse, I also had to edit the firewall settings to allow ping (my backup software requires I can ping the machine) from any subnet, and allow SMB access from any subnet. You see, the backup software host and most other computers are on different subnets, so I had to adjust for that. Live and learn I guess.
Pressing On… And On… And On…
Here I am again, Thursday morning (almost a week after starting this whole process), wanting to remove more machines from the AD. Then I thought about it again. Due to the configuration of our switch, and the need to forward DHCP packets correctly to the Windows Servers, what are the chances that this DHCP helper option won’t work correctly in Linux? While the switch does have fantastic documentation, it doesn’t tell you squat about what to do on the DHCP server side. My heart is racing, fears are rising. What if I can’t get this to work? Am I doing to be stuck running Windows Server forever simply because I don’t understand this very complex switch? This thing’s config hasn’t been touched since 2008, and even contacting someone who worked with it back then (pretty cool of the guy to talk to me after not being with the company for over 5 years) proved no help as he worked with 3 other people on getting this thing set up, and he didn’t know the CLI password (which is where I have to change these settings). I can’t get a hold of anyone else from back then. Looks like I’m on my own… again.
Well, I can’t test this today, it would interrupt everyone. Looks like I’m going to have to work on something else for the next couple of days then come in over the weekend.
Fast forward a couple days, and here I am. Easter Sunday… at work. Oh well, it worked out pretty well this way. Sure, I would like to do spending time with family, but since everyone else is, I figure now is the perfect time to take down the network and move everything over.
So it begins, nice and early. First I disconnect the Windows servers from the network, then I change the IP address associated with the Linux box that will control this network. All services now up and running.
Time to Test
Well crap, it seems Windows computers had some issues getting DNS updates. Sometimes I can refer to the other computer by name, sometime by the full name (meaning adding the internal domain name to it), but only sometimes. After spending hours working on it, I still have no solution… I’m starting to think many of these had issues before, and it could have something to do with their own hostnames. After all, Linux likes it when you give it a domain name. Either way, it works as good as before… I think. So I continue on.
Had one Linux server get an IP address outside the DHCP range. I have no idea how this is possible. Screw it, you get a static IP and DNS name. Fixed.
After getting through a couple machines that just didn’t want to place nice. I got through the rest without hardly an issue. It actually went much more smoothly than I thought it would. After a few hours, the network was up and running again!
Now, unfortunately, I wasn’t done yet. There were a few more items that needed to be dealt with. First, the new OpenVPN. Done. Oh… that’s right, just needed to make a simple change, and everything works. I actually forgot to adjust the server IP in the configuration files to reflect the server’s new IP. Tested, and working great. Cool.
D’oh!
What about pptp you ask? Well, yes, I did want to get rid of it. The problem was many of my users were still setup to use it and hadn’t been given their new keys with OpenVPN. I’ll deal with that next week, but the problem remains of ensure they can still get in. So I fire up a machine and get to testing (using a 4g modem so I’m outside the corporate network). Connecting… connecting… verifying username and password (I didn’t know pptp was this slow, holy crap!)… damn. It just isn’t going to work for me. I’ve actually never used the old VPN, so I have no idea if it would have ever worked before or not. I hate to say it, but I think I’m going to have to wait until tomorrow morning and see who complains.
UPDATE: Upon further investigation, I’ve found that Windows 2003 will NOT let you use VPN unless it can run the DNS/DHCP. Shit… Just another reason to move away from Windows.
Up and Running
Now, where was I? Oh right. So, everything is up and running. I found a few machines that were given static IPs from the Windows servers, but were not listed, so once they got their new addresses I found their services not working. This is because of our firewall rules. So I adjusted the rules and set static IPs for those systems, so they should now always keep those addresses.
Most internal DNS is working without having to put in the domain name. I’ll work on the rest of those throughout the week. I’m not anticipating this will cause any issues with my local users (users from the other office still have to type the internal domain name. I’ll work on that later). So Maybe now I’ll finally call it a day. I’ve been here for roughly 12 hours now, and I would like to call it a day.
From this point, all I need to do is gets all the workstations off the AD that no longer exists. I worry there will be issues just leaving them alone since I have no idea how long it will take before these workstations say they can no longer login from the missing AD controller. So over the next week I’ll get this taken care of. Just need to get backups working and any mounted drives working for each user, then I’m done! Oh I can’t wait!
Getting There
Fast forward a bit here and another week has passed. During this time I’ve ended up with just 2 desktops that needed to be taken off the old AD. One is Windows 7, the other is 8.1. During this process of getting off the AD, I do have to reboot and perform some work on the firewall, so I like to set up times where I can do this with each user.
First came the Windows 8.1 machine. Oh man do I hate Windows, especially 8.1. This thing caused all sorts of problems. There is so much hidden shit all over Windows, that is just drives me insane. I couldn’t get his account to login because of group policy. That was the actual error. Access denied by group policy. So I checked the group policy on the computer… there wasn’t any! Eventually I had to just create him another account, under a different name, and copy his personal files over. What a joke. I even had something of the same type of issue with the Windows 7 machine. Which sucked because the guy handles all our finances and I really didn’t want to cause issues for him. His wasn’t nearly as bad, but it ended up as where I had to copy his files over to a new account anyway. Fortunately he was able to keep the same user name. Some things ended up in other folders, so after several hours, we located all his files and got him all setup and good to go. I was pretty happy about that. I always like when things work out pretty well when they seem to be going so badly.
Userland
Speaking on Windows and their userland. I hate how Windows handles this. It makes it very hard to move over to a new machine with all your same settings and everything exactly how you like it. I ended up screwing up one email account because he uses Outlook, and apparently you can’t just “import”. You have to export, then import, and you can’t just copy config files over. This is according to MS! This is why I run Linux. Last time I moved from one machine to another, I copied (scp) my userland over to the new computer, started X (I like KDE), and guess what? Everything was there exactly how I had it on the other computer! Amazing! Also, I know there are tools provided by Microsoft designed to help with this, unfortunately those don’t work in my situation.
DONE!
So here I am, almost 3 weeks after starting this project, and I’m finally done. Every computer is getting backups, they can access and be accessed from other VLANs (I know, I know, that’s not how you use VLANs, shut it, I like it). It has been a pain, and I wouldn’t recommend it to anyone. Especially if you are the ONLY one doing it, and you are not a Windows guy. So in the end, here is my advice. If you are running AD, just keep giving MS all your money and hope it keeps working. If you are not running AD. DON’T! Stay away! If you are going to have it, be sure to hire someone just to handle it, and make sure that is their only job.
Thank you for letting me tell my story, and if you made it this far, good on you!
UPDATE: I wish I saved the link, I found something that MS apparently does with the newer version of Windows Server. I already knew that I have to pay a lot just for the OS, but then I have to pay an additional price for each user, or CAL as they call it. Well, apparently CALs are not just for users of the Active Directory. You have to have one for each machine that uses the DHCP! I have a lot of Linux servers and even more as Virtual Machines. I refuse to keep giving money to MS for each machine just to use DHCP/DNS. That is a load of crap. Some people who commented on the article said you don’t really have to, but if MS decided to audit your network, you could end up having to pay a lot of money. I don’t know how true this is, but I wouldn’t be surprised. Glad I got away from that train wreck.