|
Part of Information Security is making sure your data is protected against loss, regardless of how the loss occurs. Any properly written Information Security policy will have a section about Business Continuity, which documents what to do in major disasters. But in addition, business continuity means making sure you've got regular backups of all your critical data and applications. As an extension of this sort of policy, one should engage in routine hardware monitoring to try and detect hardware failures, before they can cascade into bigger failures. In this article, I'll describe the technical steps on how to install and configure hardware monitoring on a Red Hat Enterperise Linux 4 Workstation. In theory, these steps would be the same for the Advanced Server version.
Red Hat provides some of the software we need to enable hardware monitoring. Most likely the packages are already installed, but make sure that you have them before proceeding: - kernel-utils
- lm_sensors
However, to get full functionality we need to download and install some additional packages: - gnome-applet-sensors-1.5.2-1.el4.rf (Download from DAG Repository)
- hddtemp-0.3-0.beta15.1.el4.rf (Download from DAG repository)
Note: If you do not use Gnome as your desktop manager than the gnome-applet-sensors package will be useless to you. But all the cool people use Gnome, so we won't worry about that. All of these packages should install without a hitch, but if you get an errors than I recommend following traditional troubleshooting methods, or contacting Red Hat support for assistance. If you are not familiar with installing packages and manually configuring services under Linux, I strongly recommend you stop here. Once all the necessary packages are installed, a number of configuration changes are needed. Red Hat does not provide any sort of auto-detection method for hardware monitoring, so the exact changes needed will depend on your hardware. In this case, I was using an HP NC6000 laptop and will use that as the basis for my configuration examples. - Configure Sensors
Open a terminal window and su to root. Issue the command "sensors-detect", which will interactively try and determine what hardware monitoring sensors exist in your system. Just hit enter at each prompt to accept the defaults, until you get to the summary. For the HP NC6000, it looks like this: Driver `adm1031' (should be inserted): Detects correctly: * Bus `SMBus I801 adapter at 1200' (Algorithm unavailable) Busdriver `i2c-i801', I2C address 0x2c Chip `Analog Devices ADM1031' (confidence: 7)
Driver `eeprom' (should be inserted): Detects correctly: * Bus `SMBus I801 adapter at 1200' (Algorithm unavailable) Busdriver `i2c-i801', I2C address 0x50 Chip `SPD EEPROM' (confidence: 8) * Bus `SMBus I801 adapter at 1200' (Algorithm unavailable) Busdriver `i2c-i801', I2C address 0x51 Chip `SPD EEPROM' (confidence: 8)
Make a note of the sensors detected and what modules they use. Then go ahead and respond to the prompt about ISA versus smbus by hitting return. You should get some example commands to use in your configuration files. In the case of the HP NC6000, I got: To make the sensors modules behave correctly, add these lines to etc/modules.conf:
#----cut here---- # I2C module options alias char-major-89 i2c-dev #----cut here----
To load everything that is needed, add this to some /etc/rc* file:
#----cut here---- # I2C adapter drivers modprobe i2c-i801 # I2C chip drivers modprobe adm1031 modprobe eeprom # sleep 2 # optional /usr/bin/sensors -s # recommended #----cut here----
And finally, you'll be asked if you want to generate /etc/sysconfig/lm_sensors. Accept the default, which is yes. This will create a configuration file for the lm_sensors service, telling it what kernel modules to interface with.
We're not done yet. As the sensors-detect program told us, we have to modify two more configuration files. Go ahead and edit /etc/modprobe.conf and add the appropriate lines to the end of the file. In the case of the NC6000 that's: # I2C module options alias char-major-89 i2c-dev
Save the changes. What this does is set an alias for one of the modules next time its loaded into memory. Next, edit /etc/rc.d/rc.sysinit to add the commands to load the modules we'll need for monitoring. I recommend inserting these just past the networking part of the script: ### Hardware monitoring # I2C adapter drivers load_module i2c-i801 # I2C chip drivers load_module adm1031 load_module eeprom /usr/bin/sensors -s # recommended echo -n $" hw monitoring"
Note that again, depending on what hardware you have, the specific modules loaded will vary. You may not be able to do hardware monitoring at all if your system's chipset is obscure.
This completes the sensor configurations. Reboot your system. The necessary kernel modules should load into memory and the lm_sensors service should start-up and map to those modules. If all is well, when you enter the command "sensors -A" at a command prompt you'll get output similar to this: eeprom-i2c-0-51 Memory type: DDR SDRAM DIMM Memory size (MB): 1024
eeprom-i2c-0-50 Memory type: DDR SDRAM DIMM Memory size (MB): 1024
adm1031-i2c-0-2c CPU Fan: 1323 RPM (min = 2008 RPM, div = 2) Case Fan: 1323 RPM (min = 2008 RPM, div = 2) SYS Temp: +45.0°C (low = +0°C, high = +60°C) SYS Crit: +85°C CPU Temp: +47.6°C (low = +41°C, high = +56°C) CPU Crit: +127°C AUX Temp: +56.9°C (low = -128°C, high = +70°C) AUX Crit: +85°C
Note that if the lm_sensors service does not start automatically at boot, you can manually start it with the command "/etc/init.d/lm_sensors start". And you can force it to always start at boot with the command "chkconfig lm_sensors on".
Once you've got the sensors stuff configured and functional, on to the next step, which should be much simpler. - Configure HDDTemp
The hddtemp package does not rely on the lm_sensors package, as it specifically talks to temperature sensors in hard drives rather than on-board sensors on a mainboard. Not all hard drives have temperature sensors but most newer models from manufacturers such as Toshiba, Samsung and Maxtor do.
Assuming the hddtemp package installed properly, you should be able to type in the command "hddtemp /dev/hda" and get a reading for your primary ATA drive. If you have SCSI drives, try "hddtemp /dev/sda". Note that if you are using any sort of RAID then hddtemp will not be able to read the temperature sensors of the individual drives in the RAID.
In order to make hddtemp run as a daemon and constantly check the drive(s) temperatures, you'll need to issue a command such as "hddtemp -d -l 127.0.0.1 /dev/hda" at boot-up. The hddtemp package doesn't include an init script so here's one you can copy and paste into /etc/init.d/hddtemp: #!/bin/sh # # hddtemp This shell script takes care of starting and stopping hddtemp. # # chkconfig: 2345 90 10 # description: hddtemp provides information about hard drives' temperature # processname: hddtemp # config: /etc/sysconfig/hddtemp # pidfile: /var/run/hddtemp
# Source networking configuration. . /etc/sysconfig/network
# Check that networking is up. [ ${NETWORKING} = "no" ] && exit 0
# hddtemp configuration [ -f /etc/sysconfig/hddtemp ] && . /etc/sysconfig/hddtemp
# Source function library. if [ -f /etc/init.d/functions ] ; then . /etc/init.d/functions elif [ -f /etc/rc.d/init.d/functions ] ; then . /etc/rc.d/init.d/functions else exit 0 fi
RETVAL=0 prog=hddtemp
# Backwards compatibility. [ -z "$HDDTEMP_OPTIONS" -a -n "$HDDTEMPARGS" ] && \ HDDTEMP_OPTIONS="$HDDTEMPARGS" HDDTEMP_OPTIONS="$HDDTEMP_OPTIONS $HDDTEMP_DAEMON_OPTIONS"
start() { echo -n $"Starting hard disk temperature monitor daemon" daemon /usr/sbin/$prog -d $HDDTEMP_OPTIONS RETVAL=$? echo [ $RETVAL -eq 0 ] && touch /var/lock/subsys/hddtemp return $RETVAL }
stop() { echo -n $"Stopping hard disk temperature monitor daemon" killproc $prog RETVAL=$? echo [ $RETVAL -eq 0 ] && rm -f /var/lock/subsys//var/lock/subsys/ return $RETVAL }
restart() { stop start }
# See how we were called. case "$1" in start|stop|restart) $1 ;; reload|force-reload) restart ;; status) status $prog ;; try-restart|condrestart) [ ! -f $lockfile ] || restart ;; *) echo $"Usage: $0 {start|stop|status|restart|try-restart|reload|force-reload}" exit 2 esac
(This was obtained from an earlier release of the package, but isn't included in the Red Hat RPM used for this project).
Make sure you issue the command "chkconfig hddtemp on" after creating the above script, so the daemon will start every time you boot. In addition, you'll need to create a file called /etc/sysconfig/hddtemp and put the following text in it: # hddtemp(8) daemon options. Add at least the disk(s) you want to monitor # here. HDDTEMP_OPTIONS="-l 127.0.0.1 /dev/hda"
Note that "/dev/hda" should be your primary hard drive of your system. If you have more than one, list them with space separation (i.e. "/dev/hda /dev/hdb") - Configure Smartd
The last hardware monitoring tool used is smartmontools, which in Red Hat Enterprise Linux 4 was integrated into an RPM called kernel-tools. S.M.A.R.T. stands for Self-Monitoring, Analysis, and Reporting Technology, though it specifically deals with hard drives. Just about all modern hard drives have S.M.A.R.T. capability. Smartmontools includes a daemon which logs S.M.AR.T. information, as well as the command-line tool smartctl for directly querying SMART status of hard drives.
Open up a root terminal window and issue the command "smartctl -a /dev/hda". You should get screenfuls of information to page through. Note, as before, that you may need to specify "/dev/sda" if you have SCSI devices. Once you've verified you can read the S.M.A.R.T. data from the drive you can proceed with setting-up smartd. If you can't read the S.M.A.R.T. data, you should review the manufacturers documentation for your make and model drive.
Make sure the smartd service is set to load on boot by issuing the command "chkconfig smartd on". Next, edit the file "/etc/smartd.conf" and add the following text to the end of the file: # This entry is for the primary ATA drive /dev/hda -d ata -o on -S on \ -s (S/../.././09|C/../../(1|3|5|7|)/10|L/../../5/15) \ -a -m root
Note that you'll need to change "/dev/hda" to the appropriate device for your primary device, and "-d ata" will need to be "-d scsi" for SCSI drives. And lastly, the "-m root" can be changed to include whatever additional e-mail addresses you want to receive S.M.A.R.T. error notifications (i.e. "-m root,bob@aol.com").
Save the changes to "/etc/smartd.conf" and then issue the command "/etc/init.d/smartd restart" to make the changes effective. You can then look in the /var/log/messages log for S.M.A.R.T. info with a command such as "grep smartd /var/log/messages": Apr 2 16:14:48 svasp1 smartd[3210]: smartd version 5.33 [i386-redhat-linux-gnu] Copyright (C) 2002-4 Bruce Allen Apr 2 16:14:48 svasp1 smartd[3210]: Home page is http://smartmontools.sourceforge.net/ Apr 2 16:14:48 svasp1 smartd[3210]: Opened configuration file /etc/smartd.conf Apr 2 16:14:48 svasp1 smartd[3210]: Configuration file /etc/smartd.conf parsed. Apr 2 16:14:48 svasp1 smartd[3210]: Device: /dev/hda, opened Apr 2 16:14:48 svasp1 smartd[3210]: Device: /dev/hda, not found in smartd database. Apr 2 16:14:49 svasp1 smartd[3210]: Device: /dev/hda, is SMART capable. Adding to "monitor" list. Apr 2 16:14:49 svasp1 smartd[3210]: Device: /dev/hda, opened Apr 2 16:14:49 svasp1 smartd[3210]: Device: /dev/hda, not found in smartd database. Apr 2 16:14:49 svasp1 smartd[3210]: Device: /dev/hda, enabled SMART Attribute Autosave. Apr 2 16:14:49 svasp1 smartd[3210]: Device: /dev/hda, enabled SMART Automatic Offline Testing. Apr 2 16:14:49 svasp1 smartd[3210]: Device: /dev/hda, is SMART capable. Adding to "monitor" list. Apr 2 16:14:49 svasp1 smartd[3210]: Monitoring 2 ATA and 0 SCSI devices Apr 2 16:14:50 svasp1 smartd[3212]: smartd has fork()ed into background mode. New PID=3212. Apr 2 16:44:51 svasp1 smartd[3212]: Device: /dev/hda, SMART Usage Attribute: 193 Load_Cycle_Count changed from 96 to 95 Apr 3 09:14:51 svasp1 smartd[3212]: Device: /dev/hda, starting scheduled Short Self-Test. Apr 3 12:14:50 svasp1 smartd[3212]: Device: /dev/hda, SMART Usage Attribute: 9 Power_On_Hours changed from 96 to 95 Apr 4 09:14:51 svasp1 smartd[3212]: Device: /dev/hda, starting scheduled Short Self-Test.
Congratulations! You've now got all the hardware monitoring functionality in place. The final step let's you tie the sensors together to a graphical monitoring tool. - Configure the Sensors Applet
Assuming you are using Gnome, you can add the Sensors applet to one of your Gnome panels for real-time monitoring of your hardware. Right-click on the panel of your choice and select "Add to Panel". From the list of available applets, select the "Hardware Sensors Monitor" and click the Add button.
Once added, just right-click on the applet and select Preference to customize it. You should be able to enable readings from the hddtemp daemon as well as temperature sensors for CPU and mainboard. Additionally, you can add fan speed readouts.
Each sensor enabled in the applet can be configured to alert if its reading goes below or above a set amount, and optionally execute a specific command. This can be used to sound audible alerts, send e-mails or even perform a system shutdown. Additionally, you may configure the name of the sensor as displayed in the applet, the font size and temperature units used.

I recommend playing simple audible alerts before doing anything with e-mail or more advanced scripting, as you need to get a "feel" for the normal operating ranges of your hardware. While you can research what the specified temperature ranges are for your CPU and drives, each sensor can have a certain amount of error in its calibration. And brief periods of high-temperature operation for a drive or CPU are not of significant concern.
Note that the sensors applet does not support display of S.M.A.R.T. data, however you will still receive e-mail notifications about errors.
I hope you find this article useful in setting-up hardware monitoring on your system. For other Linux or UNIX operating systems the steps are fundamentally the same, with the biggest issues being obtaining the necessary software for your platform and getting the right configurations in place for your particular hardware's sensors. You can find out more about setting-up harware monitoring by searching for the topics "linux hardware monitoring", "linux sensors" and "linux smartd". |