Sunday, April 03, 2011

reboot? - this is linux!

Just had a scary few minutes, this morning I was unable to logon the home mailserver (running mandriva 2010.1) across the local network - the process connected, opened up a terminal window and just hung, never getting to a prompt! Sunday lunch over, I tried opening up a term when sitting at the machine with the problems and there too, no xterm! Fortunately I was already logged in and had root (admin) access. Looking back in the logs I saw a crash in ssh (which I use to get onto the mailserver from other machines) - in ssh-block-Allow. I could understand ssh problems causing a problem with remote access - but why was local access a problem? Even a Control - C failed to interrupt whatever was running - and I could check that nothing was hogging the CPU.
I looked through any recent updates to the machine and saw nothing likely, googling only gave me my tweet for help(!) so I hurriedly connected the backup drive, did a machine backup - I was glad I already had root access and then tried a reboot. This failed to work - wouldn't shutdown with a message:
could not log bootup - already in use
so a magic key shutdown (using the sysRq key) was resorted to.
A few deep breaths later and the system rebooted and I logged in - then I realised the consequence of the ssh crash - one of the system scripts which run when you open up a terminal checks whether the machine is running ssh and, if so, asks you to unlock the key - for passwordless logins - if ssh is confused - as it evidently was - maybe that script was hanging there.
So I guess killing and restarting sshd would have avoided the reboot as to why ssh was broken that's another issue!
At least the server had been up for a month so it wasn't too unstable!

4 comments:

Jake said...

The machine had been up for a month? While that might be considered remarkable uptime for a windows pc, I suspect something is wrong when a linux box goes down, ever. Many of the linux mail servers at my day job have a current uptime of over 1600 days - and I can't recall ever seeing sshd crash on any of them, so that sounds suspicious. Check for foul play or flaky hardware.

thegzeus said...

@jake:
Not all distros are created equal.

rajm said...

Yes I'll be giving the logs a good look through today - the machine was up to date - I think there was a recent openssh update and I had long ago disabled password authentication

rajm said...

ssh version is


Name : openssh Relocations: (not relocatable)
Version : 5.5p1 Vendor: Mandriva
Release : 2.1mdv2010.1 Build Date: Tue 23 Nov 2010 12:53:22 GMT
Install Date: Thu 20 Jan 2011 21:34:29 GMT Build Host: n3.mandriva.com
Group : Networking/Remote access Source RPM: openssh-5.5p1-2.1mdv2010.1.src.rpm


Looks like a kernel oops -
Apr 3 04:10:51
faure kernel: BUG: unable to handle kernel paging request at fffffff4
kernel: IP: [] 0xf481e434
kernel: *pde = 00017067 *pte = 00000000
kernel: Oops: 0002 [#1] SMP
kernel: last sysfs file: /sys/devices/virtual/hwmon/hwmon0/temp1_input
kernel: Modules linked in:
kernel: Pid: 5081, comm: ssh-block-Allow Tainted: P B 2.6.36.2-desktop-2mnb #1 SiS-661/
kernel: EIP: 0060:[] EFLAGS: 00010292 CPU: 1
kernel: EIP is at 0xf481e434
kernel: EAX: fffffff4 EBX: f5553d74 ECX: f481e400 EDX: f55649f8
kernel: ESI: b780b000 EDI: f55649f8 EBP: f481e400 ESP: f4a0bf94
kernel: DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
kernel: Process ssh-block-Allow (pid: 5081, ti=f4a0a000 task=f5bc1940 task.ti=f4a0a000)
kernel: Stack:
kernel: b780a000 f4a0bfac c01e8972 b780a000 0989ef70 00000000 f4a0a000 c0103bdf
kernel: <0> b780a000 00001000 b767bff4 0989ef70 00000000 bfc4c980 0000005b 0000007b
kernel: <0> 0000007b 00000000 00000033 0000005b ffffe424 00000073 00000206 bfc4c95c
kernel: Call Trace:
kernel: [] ? sys_munmap+0x42/0x60
kernel: [] ? sysenter_do_call+0x12/0x28
kernel: Code: 9d 7d f5 70 6a 1e c0 b0 6a 1e c0 00 a0 83 b7 00 00 00 c0 00 00 00 00 00 a0 80 b7 00 20 7a f6 01 00 00 00 01 00 00 00 2b 00 00 00 <01> 00 ff ff 00 00 00 00 3c e4 81 f4 3c e4 81 f4 28 28 00 00 48
kernel: EIP: [] 0xf481e434 SS:ESP 0068:f4a0bf94
kernel: CR2: 00000000fffffff4
kernel: ---[ end trace 6625c8823095f49f ]---

ssh-block-Allow is a python script I'm running (I'd forgotten what it was yesterday) which blocks ip addresses adding then to iptables via shorewall which engage in systematic attempts to login vvia ssh