A simpler introduction to mon
// published 25/4/2014, updated 30/4/2015
Due to my failure to find a succinct tutorial for mon - the general purpose monitoring program for your favourite *nix distribution - I took it upon myself to write a short one. While the man page for mon is very good and extensive, it lacks a good configuration example. And as fate would have it, my linux distribution lacked the proclaimed example file which would go into more detail.
The configuration file (in my distro located at /etc/mon/mon.cf) starts out with the global options, ie. the directories where the various scripts reside (eg. scripts to alert the user or monitor scripts). The great thing about mon is that it allows you to run any kind of program to monitor a particular service - all it needs to do is return with a standard exit status: greater than 0 in case the monitoring failed and 0 if everything is fine.
# Global options
alertdir = /usr/lib/mon/alert.d
mondir = /usr/lib/mon/mon.d
logdir = /var/log/mon
historicfile = /var/log/mon/history.log
maxprocs = 20
histlength = 100
randstart = 60s
dtlogging = yes
dtlogfile = dtlog
Next we define the hostgroups. This part defines what we monitor. For example, we can monitor different webserver (here denoted servers) and add an arbitrary number of hosts.
# Define groups of hosts to monitor
hostgroup internet 8.8.8.8
hostgroup servers 162.209.8.136 someotherhost.com
Make sure to add one blank line between your watch and hostgroup definitions. Otherwise mon will trip up and do no monitoring at all. If you are unsure whether your config file is correct run mon -d
and see if it returns any errors (if it returns could not bind to tcp port 2583 - dont worry, mon is most likely already running as a service in the background - just restart the service).
The next section is called "watch definitions" and lists how we monitor. In this section I prefer a hierarchical structure (though to my knowledge the order doesn't matter). We start by monitoring the most basic thing, do we have access to the internet? We do this by sending a ping request to one of Google's DNS servers (located at 8.8.8.8) - which I presume has a very high uptime. We further define a service, called ping
, which allows us to later reference that test and make further alerts dependent on the success of this particular one (internet access). alertafter
determines how often we let the test fail until we notify the user. In this case we do it by sending a mail.alert
to my email once, see numalerts 1
(and hope that the email will be sent once we regain internet connection - good for monitoring the uptime of your ISP).
# Define watches
watch internet
service ping
description check internet access by pinging google's DNS server
interval 1m
monitor fping.monitor
period wd {Mon-Sun}
alertafter 1
alert mail.alert -f monitor@jcfrei.com johnny@mail.com
numalerts 1
Next we want to check if our webserver is still working. As we can see the watch for the hostgroup internet has an additional property: depend
. As mentioned before, sending an alert for a non responding webserver only makes sense if we actually have an internet connection. We check this by making sure the test in hostgroup internet
with the service ping
ran successfully.
watch servers
service ping
description check if our servers are responding
interval 1m
monitor fping.monitor
depend internet:ping
period wd {Mon-Sun}
alertafter 1
alert mail.alert -f monitor@jcfrei.com johnny@mail.com
numalerts 1
And we're done! (Depending on your particular setup, checking for an http response might make even more sense. Use http.monitor
to check whether your webpage is online.)