All. Really.
Suck.
So, I spent Saturday here at work. Normally, I don't do off hours, but some things cannot be done during normal work hours, so here I was. What was I doing? I was bringing the machine that is our master NIS server, NFS (home directories, shared applications, the mail spool), DNS, NTP (if you run NFS and don't keep your clocks in sync, you deserve all the pain you suffer), SMTP (thank you Wietse, for Postfix), POP3 and IMAP (I'm looking to dump these, but I've got remote users who want to get their mail this way), HTTP (Apache, mostly serving pages of HTMLized docco for the software providing all these nifty services for the users), PostgreSQL, Samba, FlexLM (network available licenses for: SGI compilers, Matlab, IDL/ENVI, Xpatch, SGI Prodev, Impressario, and other things I'm forgetting no doubt), NetBackup, Perforce, RoboInst (Kickstart equivalent in RH linux, Jumpstart in Solaris land) and Impressario (SGI's print services software), to a current version of Irix (was at 6.5.18m, now at 6.5.28m).
This machine, as you can see it's a busy box. It's probably hosting a few other services I can't remember off the top of my head as well. It provides these services internally for this office, so while it's down, this office doesn't get much done. Now, this box is a huge single point of failure, but for 70 people, the complexity and expense of having multiple redundant servers is not justified. In five years, it's only lost two HDDs, and those were secondary filesystems. Amazingly enough, the upgrade went smoothly enough. I yanked the system disk out, put it on my desk, put the shiny new drive in, and started my installation. Worst case scenario, things go badly, I put the old hard drive back in, and try again next weekend. Things went well. In Unix-land, I grew up on SunOS 4 and Irix systems, and have had Solaris thrust upon me, so I know these systems inside and out. This installation went flawless. Everything works. All services were back up and running Saturday afternoon. The client machines were all happy.
Except two. I have an O2 and a Tezro that are still at Irix 6.5.22m. This makes sense if you know that 6.5.22 is the last version of Irix to support 32bit MIPS based machines. Irix 6.5.23 and beyond only run on MIPS R8000 and above based machines. So, I keep these machines at an old version of Irix for testing purposes (that and this particular O2 is an RM5200, instead of the R10000 that the rest of my O2s are). These machines, for whatever reason are having NFS problems. They mount the remote filesystems with no complaints. My startup scripts, because I'm a good admin, all have explicit pathing for everything, so they work just fine.
Pop quiz: What's missing from this picture?
7:54am djinn /home/jamie %ls -la
total 0
7:55am djinn /home/jamie %
That's so awesome. It can't even read . and .. on NFS mounted filesystems. The files? They're there. I can explicitly access any file I know the full path and name for. I can create new files in those directories. I can do anything I'd normally expect to be able to do. Except get a file list. Of course, since startup scripts use explicit pathing for services started out of NFS shares, I had no idea NFS was having issues. These machines are also the only ones that can compile a certain piece of software due to a bug triggered by including both cmath and valarray headers on newer Irix versions. Unlike my startup scripts, [g]make uses relative paths and does lots of regex's using file lists. Guess what doesn't work anymore.
I spent 5 hours going back and forth with SGI over the phone and via email yesterday (starting at 3 of course, no one noticed before then), and they're trying to figure out both the NFS and compiler problems (these are production systems kids, only amateurs run production systems without full hardware and software support), but goddamnit, it's annoying. You'd think, since every Indigo, Indy, Indigo2, [Power] Challenge, [Power] Onyx, O2, and Crimson based on the R4x00 and R[M]5x00 out there is stuck at 6.5.22, it's the one legacy Irix version they'd fucking test against. I know it's a 6.5.28 <-> 6.5.22 specific problem. The 6.5.25 machines all work just fine. I thought it might be a 64 <-> 32 bit translation problem, but no, the Tezro is a 64bit clean system (look ma, I keep both a 32bit and a 64bit box available at the necessary release for testing, you'd think I'm a professional). Even the ancient Indigo R3000/33 running Irix 5.3 can mount these shares properly:
8:21am skeleton /home/jamie %uname -a
IRIX skeleton 5.3 11091810 IP12 mips
8:21am skeleton /home/jamie %hinv
1 33 MHZ IP12 Processor
FPU: MIPS R3010A VLSI Floating Point Chip Revision: 4.0
CPU: MIPS R3000A Processor Chip Revision: 3.0
...
8:22am skeleton /home/jamie %ls -la
total 39174
drwxr-xr-x 54 jamie staff 4096 Oct 25 08:05 .
dr-xr-xr-x 4 root sys 512 Oct 25 08:21 ..
-rwxr-xr-x 1 jamie staff 15364 Oct 22 15:51 .DS_Store
drwxr-xr-x 3 jamie staff 154 Jun 14 08:01 .OpenOffice.org1.0.3
-rw-r--r-- 1 jamie staff 985 Oct 19 2004 .Sgiresources
....
I wasn't kidding.
And yes, I really do have a 14 year old machine still running an 11 year old operating system. Don't ask. It lives in a closet. Its name is skeleton.
Suck.
So, I spent Saturday here at work. Normally, I don't do off hours, but some things cannot be done during normal work hours, so here I was. What was I doing? I was bringing the machine that is our master NIS server, NFS (home directories, shared applications, the mail spool), DNS, NTP (if you run NFS and don't keep your clocks in sync, you deserve all the pain you suffer), SMTP (thank you Wietse, for Postfix), POP3 and IMAP (I'm looking to dump these, but I've got remote users who want to get their mail this way), HTTP (Apache, mostly serving pages of HTMLized docco for the software providing all these nifty services for the users), PostgreSQL, Samba, FlexLM (network available licenses for: SGI compilers, Matlab, IDL/ENVI, Xpatch, SGI Prodev, Impressario, and other things I'm forgetting no doubt), NetBackup, Perforce, RoboInst (Kickstart equivalent in RH linux, Jumpstart in Solaris land) and Impressario (SGI's print services software), to a current version of Irix (was at 6.5.18m, now at 6.5.28m).
This machine, as you can see it's a busy box. It's probably hosting a few other services I can't remember off the top of my head as well. It provides these services internally for this office, so while it's down, this office doesn't get much done. Now, this box is a huge single point of failure, but for 70 people, the complexity and expense of having multiple redundant servers is not justified. In five years, it's only lost two HDDs, and those were secondary filesystems. Amazingly enough, the upgrade went smoothly enough. I yanked the system disk out, put it on my desk, put the shiny new drive in, and started my installation. Worst case scenario, things go badly, I put the old hard drive back in, and try again next weekend. Things went well. In Unix-land, I grew up on SunOS 4 and Irix systems, and have had Solaris thrust upon me, so I know these systems inside and out. This installation went flawless. Everything works. All services were back up and running Saturday afternoon. The client machines were all happy.
Except two. I have an O2 and a Tezro that are still at Irix 6.5.22m. This makes sense if you know that 6.5.22 is the last version of Irix to support 32bit MIPS based machines. Irix 6.5.23 and beyond only run on MIPS R8000 and above based machines. So, I keep these machines at an old version of Irix for testing purposes (that and this particular O2 is an RM5200, instead of the R10000 that the rest of my O2s are). These machines, for whatever reason are having NFS problems. They mount the remote filesystems with no complaints. My startup scripts, because I'm a good admin, all have explicit pathing for everything, so they work just fine.
Pop quiz: What's missing from this picture?
7:54am djinn /home/jamie %ls -la
total 0
7:55am djinn /home/jamie %
That's so awesome. It can't even read . and .. on NFS mounted filesystems. The files? They're there. I can explicitly access any file I know the full path and name for. I can create new files in those directories. I can do anything I'd normally expect to be able to do. Except get a file list. Of course, since startup scripts use explicit pathing for services started out of NFS shares, I had no idea NFS was having issues. These machines are also the only ones that can compile a certain piece of software due to a bug triggered by including both cmath and valarray headers on newer Irix versions. Unlike my startup scripts, [g]make uses relative paths and does lots of regex's using file lists. Guess what doesn't work anymore.
I spent 5 hours going back and forth with SGI over the phone and via email yesterday (starting at 3 of course, no one noticed before then), and they're trying to figure out both the NFS and compiler problems (these are production systems kids, only amateurs run production systems without full hardware and software support), but goddamnit, it's annoying. You'd think, since every Indigo, Indy, Indigo2, [Power] Challenge, [Power] Onyx, O2, and Crimson based on the R4x00 and R[M]5x00 out there is stuck at 6.5.22, it's the one legacy Irix version they'd fucking test against. I know it's a 6.5.28 <-> 6.5.22 specific problem. The 6.5.25 machines all work just fine. I thought it might be a 64 <-> 32 bit translation problem, but no, the Tezro is a 64bit clean system (look ma, I keep both a 32bit and a 64bit box available at the necessary release for testing, you'd think I'm a professional). Even the ancient Indigo R3000/33 running Irix 5.3 can mount these shares properly:
8:21am skeleton /home/jamie %uname -a
IRIX skeleton 5.3 11091810 IP12 mips
8:21am skeleton /home/jamie %hinv
1 33 MHZ IP12 Processor
FPU: MIPS R3010A VLSI Floating Point Chip Revision: 4.0
CPU: MIPS R3000A Processor Chip Revision: 3.0
...
8:22am skeleton /home/jamie %ls -la
total 39174
drwxr-xr-x 54 jamie staff 4096 Oct 25 08:05 .
dr-xr-xr-x 4 root sys 512 Oct 25 08:21 ..
-rwxr-xr-x 1 jamie staff 15364 Oct 22 15:51 .DS_Store
drwxr-xr-x 3 jamie staff 154 Jun 14 08:01 .OpenOffice.org1.0.3
-rw-r--r-- 1 jamie staff 985 Oct 19 2004 .Sgiresources
....
I wasn't kidding.
And yes, I really do have a 14 year old machine still running an 11 year old operating system. Don't ask. It lives in a closet. Its name is skeleton.