A simpler introduction to mon

// published 25/4/2014, updated 30/4/2015

Due to my failure to find a succinct tutorial for mon - the general purpose monitoring program for your favourite *nix distribution - I took it upon myself to write a short one. While the man page for mon is very good and extensive, it lacks a good configuration example. And as fate would have it, my linux distribution lacked the proclaimed example file which would go into more detail.

The configuration file (in my distro located at /etc/mon/mon.cf) starts out with the global options, ie. the directories where the various scripts reside (eg. scripts to alert the user or monitor scripts). The great thing about mon is that it allows you to run any kind of program to monitor a particular service - all it needs to do is return with a standard exit status: greater than 0 in case the monitoring failed and 0 if everything is fine.

# Global options
alertdir                = /usr/lib/mon/alert.d
mondir                  = /usr/lib/mon/mon.d
logdir                  = /var/log/mon
historicfile            = /var/log/mon/history.log
maxprocs                = 20
histlength              = 100
randstart               = 60s
dtlogging               = yes
dtlogfile               = dtlog

Next we define the hostgroups. This part defines what we monitor. For example, we can monitor different webserver (here denoted servers) and add an arbitrary number of hosts.

# Define groups of hosts to monitor
hostgroup internet 8.8.8.8
hostgroup servers 162.209.8.136 someotherhost.com


Make sure to add one blank line between your watch and hostgroup definitions. Otherwise mon will trip up and do no monitoring at all. If you are unsure whether your config file is correct run mon -d and see if it returns any errors (if it returns could not bind to tcp port 2583 - dont worry, mon is most likely already running as a service in the background - just restart the service).

The next section is called "watch definitions" and lists how we monitor. In this section I prefer a hierarchical structure (though to my knowledge the order doesn't matter). We start by monitoring the most basic thing, do we have access to the internet? We do this by sending a ping request to one of Google's DNS servers (located at 8.8.8.8) - which I presume has a very high uptime. We further define a service, called ping, which allows us to later reference that test and make further alerts dependent on the success of this particular one (internet access). alertafter determines how often we let the test fail until we notify the user. In this case we do it by sending a mail.alert to my email once, see numalerts 1 (and hope that the email will be sent once we regain internet connection - good for monitoring the uptime of your ISP).

# Define watches

watch internet
        service ping
                description check internet access by pinging google's DNS server
                interval 1m
                monitor fping.monitor
                period wd {Mon-Sun}
                        alertafter 1
                        alert mail.alert -f monitor@jcfrei.com johnny@mail.com
                        numalerts 1

Next we want to check if our webserver is still working. As we can see the watch for the hostgroup internet has an additional property: depend. As mentioned before, sending an alert for a non responding webserver only makes sense if we actually have an internet connection. We check this by making sure the test in hostgroup internet with the service ping ran successfully.

watch servers
        service ping
                description check if our servers are responding
                interval 1m
                monitor fping.monitor
                depend internet:ping
                period wd {Mon-Sun}
                        alertafter 1
                        alert mail.alert -f monitor@jcfrei.com johnny@mail.com
                        numalerts 1

And we're done! (Depending on your particular setup, checking for an http response might make even more sense. Use http.monitor to check whether your webpage is online.)