Sponsored Links

Adding Hardware Monitoring to RHEL 4 WS PDF Print E-mail
Thursday, 05 April 2007 21:30

Part of Information Security is making sure your data is protected against loss, regardless of how the loss occurs. Any properly written Information Security policy will have a section about Business Continuity, which documents what to do in major disasters. But in addition, business continuity means making sure you've got regular backups of all your critical data and applications.

As an extension of this sort of policy, one should engage in routine hardware monitoring to try and detect hardware failures, before they can cascade into bigger failures. In this article, I'll describe the technical steps on how to install and configure hardware monitoring on a Red Hat Enterperise Linux 4 Workstation. In theory, these steps would be the same for the Advanced Server version.

Red Hat provides some of the software we need to enable hardware monitoring. Most likely the packages are already installed, but make sure that you have them before proceeding:
  1. kernel-utils
  2. lm_sensors 

However, to get full functionality we need to download and install some additional packages:

  1. gnome-applet-sensors-1.5.2-1.el4.rf (Download from DAG Repository)
  2. hddtemp-0.3-0.beta15.1.el4.rf (Download from DAG repository)

Note: If you do not use Gnome as your desktop manager than the gnome-applet-sensors package will be useless to you.  But all the cool people use Gnome, so we won't worry about that.

All of these packages should install without a hitch, but if you get an errors than I recommend following traditional troubleshooting methods, or contacting Red Hat support for assistance. If you are not familiar with installing packages and manually configuring services under Linux, I strongly recommend you stop here.


Once all the necessary packages are installed, a number of configuration changes are needed. Red Hat does not provide any sort of auto-detection method for hardware monitoring, so the exact changes needed will depend on your hardware. In this case, I was using an HP NC6000 laptop and will use that as the basis for my configuration examples.
  1. Configure Sensors
    Open a terminal window and su to root. Issue the command "sensors-detect", which will interactively try and determine what hardware monitoring sensors exist in your system. Just hit enter at each prompt to accept the defaults, until you get to the summary. For the HP NC6000, it looks like this:
    Driver `adm1031' (should be inserted):
      Detects correctly:
      * Bus `SMBus I801 adapter at 1200' (Algorithm unavailable)
        Busdriver `i2c-i801', I2C address 0x2c
        Chip `Analog Devices ADM1031' (confidence: 7)

    Driver `eeprom' (should be inserted):
      Detects correctly:
      * Bus `SMBus I801 adapter at 1200' (Algorithm unavailable)
        Busdriver `i2c-i801', I2C address 0x50
        Chip `SPD EEPROM' (confidence: 8)
      * Bus `SMBus I801 adapter at 1200' (Algorithm unavailable)
        Busdriver `i2c-i801', I2C address 0x51
        Chip `SPD EEPROM' (confidence: 8)
    Make a note of the sensors detected and what modules they use. Then go ahead and respond to the prompt about ISA versus smbus by hitting return.  You should get some example commands to use in your configuration files. In the case of the HP NC6000, I got:
    To make the sensors modules behave correctly, add these lines to etc/modules.conf:

    #----cut here----
    # I2C module options
    alias char-major-89 i2c-dev
    #----cut here----

    To load everything that is needed, add this to some /etc/rc* file:

    #----cut here----
    # I2C adapter drivers
    modprobe i2c-i801
    # I2C chip drivers
    modprobe adm1031
    modprobe eeprom
    # sleep 2 # optional
    /usr/bin/sensors -s # recommended
    #----cut here----
    And finally, you'll be asked if you want to generate /etc/sysconfig/lm_sensors. Accept the default, which is yes. This will create a configuration file for the lm_sensors service, telling it what kernel modules to interface with.

    We're not done yet. As the sensors-detect program told us, we have to modify two more configuration files. Go ahead and edit /etc/modprobe.conf and add the appropriate lines to the end of the file. In the case of the NC6000 that's:
    # I2C module options
    alias char-major-89 i2c-dev
    Save the changes. What this does is set an alias for one of the modules next time its loaded into memory. Next, edit /etc/rc.d/rc.sysinit to add the commands to load the modules we'll need for monitoring. I recommend inserting these just past the networking part of the script:
    ### Hardware monitoring
    # I2C adapter drivers
    load_module i2c-i801
    # I2C chip drivers
    load_module adm1031
    load_module eeprom
    /usr/bin/sensors -s # recommended
    echo -n $" hw monitoring"
    Note that again, depending on what hardware you have, the specific modules loaded will vary. You may not be able to do hardware monitoring at all if your system's chipset is obscure.

    This completes the sensor configurations. Reboot your system. The necessary kernel modules should load into memory and the lm_sensors service should start-up and map to those modules. If all is well, when you enter the command "sensors -A" at a command prompt you'll get output similar to this:
    eeprom-i2c-0-51
    Memory type:            DDR SDRAM DIMM
    Memory size (MB):       1024

    eeprom-i2c-0-50
    Memory type:            DDR SDRAM DIMM
    Memory size (MB):       1024

    adm1031-i2c-0-2c
    CPU Fan:  1323 RPM  (min = 2008 RPM, div = 2)
    Case Fan: 1323 RPM  (min = 2008 RPM, div = 2)
    SYS Temp:  +45.0°C  (low  =    +0°C, high =   +60°C)
    SYS Crit:    +85°C
    CPU Temp:  +47.6°C  (low  =   +41°C, high =   +56°C)
    CPU Crit:   +127°C
    AUX Temp:  +56.9°C  (low  =  -128°C, high =   +70°C)
    AUX Crit:    +85°C
    Note that if the lm_sensors service does not start automatically at boot, you can manually start it with the command "/etc/init.d/lm_sensors start". And you can force it to always start at boot with the command "chkconfig lm_sensors on".

    Once you've got the sensors stuff configured and functional, on to the next step, which should be much simpler.
  2. Configure HDDTemp
    The hddtemp package does not rely on the lm_sensors package, as it specifically talks to temperature sensors in hard drives rather than on-board sensors on a mainboard.  Not all hard drives have temperature sensors but  most newer models from manufacturers such as Toshiba, Samsung and Maxtor do.

    Assuming the hddtemp package installed properly, you should be able to type in the command "hddtemp /dev/hda" and get a reading for your primary ATA drive. If you have SCSI drives, try "hddtemp /dev/sda". Note that if you are using any sort of RAID then hddtemp will not be able to read the temperature sensors of the individual drives in the RAID.

    In order to make hddtemp run as a daemon and constantly check the drive(s) temperatures, you'll need to issue a command such as "hddtemp -d -l 127.0.0.1 /dev/hda" at boot-up. The hddtemp package doesn't include an init script so here's one you can copy and paste into /etc/init.d/hddtemp:
    #!/bin/sh
    #
    # hddtemp       This shell script takes care of starting and stopping hddtemp.
    #
    # chkconfig: 2345 90 10
    # description: hddtemp provides information about hard drives' temperature
    # processname: hddtemp
    # config: /etc/sysconfig/hddtemp
    # pidfile: /var/run/hddtemp

    # Source networking configuration.
    . /etc/sysconfig/network

    # Check that networking is up.
    [ ${NETWORKING} = "no" ] && exit 0

    # hddtemp configuration
    [ -f /etc/sysconfig/hddtemp ] && . /etc/sysconfig/hddtemp

    # Source function library.
    if [ -f /etc/init.d/functions ] ; then
      . /etc/init.d/functions
    elif [ -f /etc/rc.d/init.d/functions ] ; then
      . /etc/rc.d/init.d/functions
    else
      exit 0
    fi

    RETVAL=0
    prog=hddtemp

    # Backwards compatibility.
    [ -z "$HDDTEMP_OPTIONS" -a -n "$HDDTEMPARGS" ] && \
      HDDTEMP_OPTIONS="$HDDTEMPARGS"
    HDDTEMP_OPTIONS="$HDDTEMP_OPTIONS $HDDTEMP_DAEMON_OPTIONS"

    start() {
        echo -n $"Starting hard disk temperature monitor daemon"
        daemon /usr/sbin/$prog -d $HDDTEMP_OPTIONS
        RETVAL=$?
        echo
        [ $RETVAL -eq 0 ] && touch /var/lock/subsys/hddtemp
        return $RETVAL
    }

    stop() {
        echo -n $"Stopping hard disk temperature monitor daemon"
        killproc $prog
        RETVAL=$?
        echo
        [ $RETVAL -eq 0 ] && rm -f /var/lock/subsys//var/lock/subsys/
        return $RETVAL
    }

    restart() {
        stop
        start
    }

    # See how we were called.
    case "$1" in
        start|stop|restart)
            $1
            ;;
        reload|force-reload)
            restart
            ;;
        status)
            status $prog
            ;;
        try-restart|condrestart)
            [ ! -f $lockfile ] || restart
            ;;
        *)
            echo $"Usage: $0 {start|stop|status|restart|try-restart|reload|force-reload}"
            exit 2
    esac
    (This was obtained from an earlier release of the package, but isn't included in the Red Hat RPM used for this project).

    Make sure you issue the command "chkconfig hddtemp on" after creating the above script, so the daemon will start every time you boot. In addition, you'll need to create a file called /etc/sysconfig/hddtemp and put the following text in it:
    # hddtemp(8) daemon options.  Add at least the disk(s) you want to monitor
    # here.
    HDDTEMP_OPTIONS="-l 127.0.0.1 /dev/hda"
    Note that "/dev/hda" should be your primary hard drive of your system. If you have more than one, list them with space separation (i.e. "/dev/hda /dev/hdb")
  3. Configure Smartd
    The last hardware monitoring tool used is smartmontools, which in Red Hat Enterprise Linux 4 was integrated into an RPM called kernel-tools. S.M.A.R.T. stands for Self-Monitoring, Analysis, and Reporting Technology, though it specifically deals with hard drives. Just about all modern hard drives have S.M.A.R.T. capability. Smartmontools includes a daemon which logs S.M.AR.T. information, as well as the  command-line tool smartctl for directly querying SMART status of hard drives.

    Open up a root terminal window and issue the command "smartctl -a /dev/hda". You should get screenfuls of information to page through. Note, as before, that you may need to specify "/dev/sda" if you have SCSI devices. Once you've verified you can read the S.M.A.R.T. data from the drive you can proceed with setting-up smartd. If you can't read the S.M.A.R.T. data, you should review the manufacturers documentation for your make and model drive.

    Make sure the smartd service is set to load on boot by issuing the command "chkconfig smartd on". Next, edit the file "/etc/smartd.conf" and add the following text to the end of the file:
    # This entry is for the primary ATA drive
    /dev/hda -d ata -o on -S on  \
        -s (S/../.././09|C/../../(1|3|5|7|)/10|L/../../5/15) \
        -a -m root
    Note that you'll need to change "/dev/hda" to the appropriate device for your primary device, and "-d ata" will need to be "-d scsi" for SCSI drives. And lastly, the "-m root" can be changed to include whatever additional e-mail addresses you want to receive S.M.A.R.T. error notifications (i.e. "-m root,bob@aol.com").

    Save the changes to "/etc/smartd.conf" and then issue the command "/etc/init.d/smartd restart" to make the changes effective. You can then look in the /var/log/messages log for S.M.A.R.T. info with a command such as "grep smartd /var/log/messages":
    Apr  2 16:14:48 svasp1 smartd[3210]: smartd version 5.33 [i386-redhat-linux-gnu] Copyright (C) 2002-4 Bruce Allen
    Apr  2 16:14:48 svasp1 smartd[3210]: Home page is http://smartmontools.sourceforge.net/
    Apr  2 16:14:48 svasp1 smartd[3210]: Opened configuration file /etc/smartd.conf
    Apr  2 16:14:48 svasp1 smartd[3210]: Configuration file /etc/smartd.conf parsed.
    Apr  2 16:14:48 svasp1 smartd[3210]: Device: /dev/hda, opened
    Apr  2 16:14:48 svasp1 smartd[3210]: Device: /dev/hda, not found in smartd database.
    Apr  2 16:14:49 svasp1 smartd[3210]: Device: /dev/hda, is SMART capable. Adding
    to "monitor" list.
    Apr  2 16:14:49 svasp1 smartd[3210]: Device: /dev/hda, opened
    Apr  2 16:14:49 svasp1 smartd[3210]: Device: /dev/hda, not found in smartd database.
    Apr  2 16:14:49 svasp1 smartd[3210]: Device: /dev/hda, enabled SMART Attribute Autosave.
    Apr  2 16:14:49 svasp1 smartd[3210]: Device: /dev/hda, enabled SMART Automatic Offline Testing.
    Apr  2 16:14:49 svasp1 smartd[3210]: Device: /dev/hda, is SMART capable. Adding
    to "monitor" list.
    Apr  2 16:14:49 svasp1 smartd[3210]: Monitoring 2 ATA and 0 SCSI devices
    Apr  2 16:14:50 svasp1 smartd[3212]: smartd has fork()ed into background mode. New PID=3212.
    Apr  2 16:44:51 svasp1 smartd[3212]: Device: /dev/hda, SMART Usage Attribute: 193 Load_Cycle_Count changed from 96 to 95
    Apr  3 09:14:51 svasp1 smartd[3212]: Device: /dev/hda, starting scheduled Short
    Self-Test.
    Apr  3 12:14:50 svasp1 smartd[3212]: Device: /dev/hda, SMART Usage Attribute: 9
    Power_On_Hours changed from 96 to 95
    Apr  4 09:14:51 svasp1 smartd[3212]: Device: /dev/hda, starting scheduled Short
    Self-Test.
    Congratulations! You've now got all the hardware monitoring functionality in place. The final step let's you tie the sensors together to a graphical monitoring tool.
  4. Configure the Sensors Applet
    Assuming you are using Gnome, you can add the Sensors applet to one of your Gnome panels for real-time monitoring of your hardware. Right-click on the panel of your choice and select "Add to Panel". From the list of available applets, select the "Hardware Sensors Monitor" and click the Add button.

    Once added, just right-click on the applet and select Preference to customize it. You should be able to enable readings from the hddtemp daemon as well as temperature sensors for CPU and mainboard. Additionally, you can add fan speed readouts.

    Each sensor enabled in the applet can be configured to alert if its reading goes below or above a set amount, and optionally execute a specific command. This can be used to sound audible alerts, send e-mails or even perform a system shutdown. Additionally, you may configure the name of the sensor as displayed in the applet, the font size and temperature units used.

    Screenshot of sensors applet

    I recommend playing simple audible alerts before doing anything with e-mail or more advanced scripting, as you need to get a "feel" for the normal operating ranges of your hardware. While you can research what the specified temperature ranges are for your CPU and drives, each sensor can have a certain amount of error in its calibration. And brief periods of high-temperature operation for a drive or CPU are not of significant concern.

    Note that the sensors applet does not support display of S.M.A.R.T. data, however you will still receive e-mail notifications about errors.

I hope you find this article useful in setting-up hardware monitoring on your system. For other Linux or UNIX operating systems the steps are fundamentally the same, with the biggest issues being obtaining the necessary software for your platform and getting the right configurations in place for your particular hardware's sensors.

You can find out more about setting-up harware monitoring by searching for the topics "linux hardware monitoring", "linux sensors" and "linux smartd".