Scheduled disks status checks

Having any number of disks connected to your system put you in charge of monitoring their health, acting in the right way when a disk is about to fail is crucial.

Making use of a tool that, when run on a scheduled basis, notify you on screen when a disk is not passing an healthcheck could prevent disaster from occurring.

Let's have a look at a script that could be used as a starting point to create an always on monitoring disk tool.

The script below, to be run as root, acts on a 5-steps based procedure:

  1. get all the X11/Xorg displays active
  2. get the list of users having a X11/Xorg session active
  3. scan the available disks
  4. check if the smart self-test is ok
  5. notify disk with issues by opening a popup window for each user logged in

Here the script ...

#!/bin/bash

#
# make sure you have installed:
# 1. base tools : sudo | ps | grep | sed | sort | uniq | awk | tail
# 2. extra tools: smartctl | zenity
#

# script can be run only as root
if [ "$(id -u)" != "0" ]; then
   exit 1
fi

# get list of all displays available | Xorg ...
DISPLAYS="$( LANG= ps eax  | grep DISPLAY= | sed -e 's/^.*DISPLAY=/DISPLAY=/' | sed -e 's/ .*$//' | grep '\.' | cut -d= -f2 | sort | uniq )"

# get list of all logged in users with a "display" active
USERS="$( LANG= ps eaux | grep DISPLAY= | awk '{print $1}' | tail -n +1 | sort | uniq )"

# get list of all disks
DISKS="$( ls /dev/nvme[0-9]n[0-9] /dev/sd[a-z] /dev/sd[a-z][a-z]  /dev/hd[a-z] /dev/hd[a-z][a-z] 2>/dev/null )"

# loop through found disks
for DISK in $DISKS ;
do

    # get disk status
    STATUS="$( LANG= smartctl -H ${DISK} 2>/dev/null | grep -i "test result" | grep -v PASSED 2>/dev/null )"

    # if status is not PASSED then notify all the users !!!
    if [ "$STATUS" != "" ]; then
        for USERX in $USERS ;
        do
            for DISPLAYX in $DISPLAYS ;
            do
                DISPLAY=$DISPLAYX \
                    sudo -u ${USERX} --preserve-env=DISPLAY \
                    zenity --error --text="Disk errors on ${DISK}" --icon="error" \
                2>/dev/null >/dev/null &
            done
        done
    fi

done
exit 0

As always ... the script comes with no warranties ;-)