Sunday, April 03, 2011

reboot? - this is linux!

Just had a scary few minutes, this morning I was unable to logon the home mailserver (running mandriva 2010.1) across the local network - the process connected, opened up a terminal window and just hung, never getting to a prompt! Sunday lunch over, I tried opening up a term when sitting at the machine with the problems and there too, no xterm! Fortunately I was already logged in and had root (admin) access. Looking back in the logs I saw a crash in ssh (which I use to get onto the mailserver from other machines) - in ssh-block-Allow. I could understand ssh problems causing a problem with remote access - but why was local access a problem? Even a Control - C failed to interrupt whatever was running - and I could check that nothing was hogging the CPU.
I looked through any recent updates to the machine and saw nothing likely, googling only gave me my tweet for help(!) so I hurriedly connected the backup drive, did a machine backup - I was glad I already had root access and then tried a reboot. This failed to work - wouldn't shutdown with a message:
could not log bootup - already in use
so a magic key shutdown (using the sysRq key) was resorted to.
A few deep breaths later and the system rebooted and I logged in - then I realised the consequence of the ssh crash - one of the system scripts which run when you open up a terminal checks whether the machine is running ssh and, if so, asks you to unlock the key - for passwordless logins - if ssh is confused - as it evidently was - maybe that script was hanging there.
So I guess killing and restarting sshd would have avoided the reboot as to why ssh was broken that's another issue!
At least the server had been up for a month so it wasn't too unstable!

4 comments:

  1. The machine had been up for a month? While that might be considered remarkable uptime for a windows pc, I suspect something is wrong when a linux box goes down, ever. Many of the linux mail servers at my day job have a current uptime of over 1600 days - and I can't recall ever seeing sshd crash on any of them, so that sounds suspicious. Check for foul play or flaky hardware.

    ReplyDelete
  2. Anonymous5:25 am

    @jake:
    Not all distros are created equal.

    ReplyDelete
  3. Yes I'll be giving the logs a good look through today - the machine was up to date - I think there was a recent openssh update and I had long ago disabled password authentication

    ReplyDelete
  4. ssh version is


    Name : openssh Relocations: (not relocatable)
    Version : 5.5p1 Vendor: Mandriva
    Release : 2.1mdv2010.1 Build Date: Tue 23 Nov 2010 12:53:22 GMT
    Install Date: Thu 20 Jan 2011 21:34:29 GMT Build Host: n3.mandriva.com
    Group : Networking/Remote access Source RPM: openssh-5.5p1-2.1mdv2010.1.src.rpm


    Looks like a kernel oops -
    Apr 3 04:10:51
    faure kernel: BUG: unable to handle kernel paging request at fffffff4
    kernel: IP: [] 0xf481e434
    kernel: *pde = 00017067 *pte = 00000000
    kernel: Oops: 0002 [#1] SMP
    kernel: last sysfs file: /sys/devices/virtual/hwmon/hwmon0/temp1_input
    kernel: Modules linked in:
    kernel: Pid: 5081, comm: ssh-block-Allow Tainted: P B 2.6.36.2-desktop-2mnb #1 SiS-661/
    kernel: EIP: 0060:[] EFLAGS: 00010292 CPU: 1
    kernel: EIP is at 0xf481e434
    kernel: EAX: fffffff4 EBX: f5553d74 ECX: f481e400 EDX: f55649f8
    kernel: ESI: b780b000 EDI: f55649f8 EBP: f481e400 ESP: f4a0bf94
    kernel: DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
    kernel: Process ssh-block-Allow (pid: 5081, ti=f4a0a000 task=f5bc1940 task.ti=f4a0a000)
    kernel: Stack:
    kernel: b780a000 f4a0bfac c01e8972 b780a000 0989ef70 00000000 f4a0a000 c0103bdf
    kernel: <0> b780a000 00001000 b767bff4 0989ef70 00000000 bfc4c980 0000005b 0000007b
    kernel: <0> 0000007b 00000000 00000033 0000005b ffffe424 00000073 00000206 bfc4c95c
    kernel: Call Trace:
    kernel: [] ? sys_munmap+0x42/0x60
    kernel: [] ? sysenter_do_call+0x12/0x28
    kernel: Code: 9d 7d f5 70 6a 1e c0 b0 6a 1e c0 00 a0 83 b7 00 00 00 c0 00 00 00 00 00 a0 80 b7 00 20 7a f6 01 00 00 00 01 00 00 00 2b 00 00 00 <01> 00 ff ff 00 00 00 00 3c e4 81 f4 3c e4 81 f4 28 28 00 00 48
    kernel: EIP: [] 0xf481e434 SS:ESP 0068:f4a0bf94
    kernel: CR2: 00000000fffffff4
    kernel: ---[ end trace 6625c8823095f49f ]---

    ssh-block-Allow is a python script I'm running (I'd forgotten what it was yesterday) which blocks ip addresses adding then to iptables via shorewall which engage in systematic attempts to login vvia ssh

    ReplyDelete