|
To get NetSaint to monitor my RAID array was not as simple as getting NetSaint
to monitor a regular disk. I was already using
netsaint_statd to
monitor remote machines. I have them all set up so I can see load, process count,
users, and diskspace usage. I will extend netsaint_statd to monitor RAID
status.
This additional feature will involved several distinct steps:
- Create a perl script for use by
netsaint_statd to monitor the RAID
- Extend
netsaint_statd to use that script
- Add RAID to the services monitored by NetSaint
RAID Perl script
As the basis for the perl script, I used check_users.pl as supplied with
netsaint_statd and I created
check_adptraid.pl. I installed
that script into the same directory as all the other netsaint_statd scripts
(/usr/local/libexec/netsaint/netsaint_statd/netsaint_statd.
If you look at this script, you'll see that we're looking for the 3 major
status values:
if ($servanswer =~ m%^Reconstruct%) {
$state = "WARNING";
$answer = $servanswer;
} else {
if ($servanswer =~ m%^Degraded%) {
$state = "CRITICAL";
$answer = $servanswer;
} else {
if ($servanswer =~ m%^Optimal%) {
$state = "OK";
$answer = $servanswer;
} else {
$answer = $servanswer;
$state = "CRITICAL";
}
}
}
I have decided that Degraded and unknown results will be CRITICAL, Optimal will be
OK, and that Reconstruction will be a WARNING.
The next step is to modify netsaint_statd to use this newly added script.
netsaint_statd patch
The patch for netsaint_statd is available from
here. Apply the patch like
this:
cd /usr/local/libexec/netsaint/netsaint_statd
patch < path.to.patch.you.downloaded
Now that you have modified the daemon, you need to kill it and restart it:
# ps auwx | grep netsaint_statd
root 28778 0.0 0.5 3052 2460 ?? Ss 6:56PM 0:00.32 /usr/bin/perl
/usr/local/libexec/netsaint/netsaint_statd/netsaint_statd
# kill -TERM 28778
# /usr/local/etc/rc.d/netsaint_statd.sh start
#
Add RAID to the services monitored by NetSaint
Now we have the remote RAID box ready to tell us all about the RAID status.
Now it's time to test it.
# cd /usr/local/libexec/netsaint/netsaint_statd
# perl check_adptraid.pl polo
Reconstruct 85%
That looks right to me! Now I'll show you what I added to NetSaint to use this
new tool.
First, I'll add the service definition to
/usr/local/etc/netsaint/hosts.cfg:
service[polo]=RAID;0;24x7;3;2;1;raid-admins;120;24x7;1;1;1;;check_adptraid.pl
I have set up a new notification_group (raid-admins) because I want to be
notified via text message to my cellphone when the RAID array has a problem.
The contact group I created was:
contactgroup[raid-admins]=RAID Administrators;danphone,dan
In this case, I want contacts danphone and dan to be notified.
Here are the contacts which relate to the above contact group (the lines below may be wrapped, but in
NetSaint there should only be two lines):
contact[dan]=Dan Langille;24x7;24x7;1;1;0;1;1;0;notify-by-email;host-notify-by-email;dan;
contact[danphone]=Dan Langille;24x7;24x7;1;1;0;1;1;0;notify-xtrashort;notify-xtrashort;dan;6135551212@pcs.example.com;
This shows that I will be emailed and an email will be sent to my cellphone.
After restarting NetSaint, I was able to see this on my webpage:
If your RAID is really important to you, then you will definitely want to test the notification via cellphone.
I did. I know it works. But I hope it never has to be used.
|