kanla is a daemon which periodically checks whether your website, mail server, etc. are still up and running.

In case a health check fails, kanla will notify you via jabber (XMPP).

Focus of kanla lies on being light-weight, being simple, using a sane configuration file, being well-documented.

1. Introduction

Many people run personal websites or websites of their pet projects. Except for professional sysadmins, none of those I asked ran any kind of software to notify them about failures. Since nobody disagreed with me about the idea being helpful, this suggests that existing software is too complex, complicated to setup, under-documented or for some other reason undesirable.

I myself wanted to be notified — by software, not by users — when http://i3wm.org is not reachable anymore. However, the software I looked at could not be configured to verify the actual page contents (as opposed to only the HTTP status) and did not even support IPv6 (come on, it’s 2013).

My conclusion was that writing a simple program myself would be faster than trying to re-implement plugins for existing monitoring/alerting software in such a way that I’d be happy with them.

kanla is the fruit of this effort. It is intentionally simple and focuses on small-scale alerting. This means it will NOT generate uptime statistics, or execute certain actions when some condition is met, or handle pager alerts and on-call schedules. Instead, it verifies what you describe and whenever something is wrong, it sends you a message via jabber (XMPP).

1.1. Terminology

kanla instance

A process of the kanla binary, running with precisely one configuration. You can have multiple instances running and each instance usually has several plugins running.

plugin

A small program which verifies that things are good. For example, the "http" plugin retrieves websites and verifies their content. If anything goes wrong you will get an alert.

alert

The message which kanla sends to you when things go wrong.

2. Installation

To install kanla, you should by all means use your distribution’s package management system.

It is intentional that there is no instruction for installing from source.

If there is no package for your distribution, consider making one. The Debian packaging serves as reference packaging.

2.1. Locations

kanla itself should be installed as /usr/bin/kanla.

kanla ships with a number of plugins. To find the plugin location for your system, use:

perl -MFile::ShareDir -E 'say File::ShareDir::dist_dir("kanla")'

Custom plugins can be placed in /usr/local/share/kanla/ or /usr/local/lib/kanla/, depending on whether they are binary files or scripts.

A plugin is just an executable file (implemented in any language, but preferably Perl) which produces output in a certain format on stderr when things go wrong. Plugins are identified by their file name, e.g. "http".

Configuration files are placed in /etc/kanla/ with the default configuration file being named default.cfg. This file — by default — includes every .cfg file in /etc/kanla/default.d/ which is where you will find example configurations for every plugin.

3. Configuration

The main kanla configuration file is called /etc/kanla/default.cfg. It contains the jabber (XMPP) account configuration, the default alert destination (e.g. your jabber id) and includes /etc/kanla/default.d/*.cfg.

After modifying the jabber (XMPP) account configuration, you need to enable at least one plugin by removing the comment signs (#) in front of every line of its configuration file. The provided files serve as a minimum useful example (while this documentation poses a reference) and will not change often so you can modify them without being bothered by config file conflicts on upgrades.

It goes without saying that configuration files are encoded as UTF-8 and UTF-8 is supported throughout kanla.

Note SysVinit will not directly report error messages when (re)starting kanla. Ensure kanla is running by checking service kanla status and/or look at /var/log/kanla.log for error messages

3.1. Quick example

A minimum useful kanla configuration:

<jabber>
    jid      = "[email protected]"
    password = "kV9eJ4LZ9KRYOCec5W2witq"
</jabber>

send_alerts_to = <<EOT
[email protected]
EOT

<monitor i3wm website>
    plugin = http
    url    = "http://www.i3wm.org/"
    url    = "http://i3wm.org/"
</monitor>

3.2. Jabber configuration

Jabber configuration is done with one (or multiple) jabber block(s). Each block needs to have at least the jid and password attribute. The following attributes can be configured:

jid

The jabber id of this account, e.g. [email protected].

password

The password for this account.

host

The hostname of the jabber server to which to connect to. It is normally NOT necessary to specify this, but some networks actively break SRV record resolution.

port

The port of the jabber server, if non-standard.

Simple jabber example:

<jabber>
    jid      = "[email protected]"
    password = "kV9eJ4LZ9KRYOCec5W2witq"
</jabber>

Complex jabber example:

<jabber>
    jid      = "[email protected]"
    password = "kV9eJ4LZ9KRYOCec5W2witq"
</jabber>

<jabber>
    jid      = "[email protected]"
    password = "kV9eJ4LZ9KRYOCec5W2witq"
    host     = "jabber.example.net"
    port     = 5222
</jabber>

There is no default value for this option, therefore you have to configure this.

Note kanla will send an XEP-0199 ping to the server every 60 seconds. If the server does not reply within 60 seconds, the connection is considered dead and will be reconnected.

3.3. Main configuration

3.3.1. send_alerts_to

A list of jabber ids (line-separated) to which alerts will be sent. This can be overwritten on a per-plugin basis. Please use the heredoc syntax, otherwise extending this list will not work (a limitation of the Config::General module).

send_alerts_to example:

send_alerts_to = <<EOT
[email protected]
[email protected]
EOT

There is no default value for this option, therefore you have to configure this.

Note

In order to maximize redundancy, you can (and should) use multiple jabber ids. Of course, they should not use the same infrastructure, so use two different jabber servers and clients (if possible).

In case you are using Android, consider using Google Talk (built-in) and Xabber (supports XEP-0184 message receipts) with a non-google account.

3.3.2. consecutive_failures

Sometimes, the problem detected by a plugin is transient. As an example, maybe the IRC plugin tries to connect in precisely the same moment in which 5 other connections are made and thus the server’s listen queue is full. The next check will succeed and there was only a brief interruption of service from your user’s point of view, if at all. Human intervention is not necessary, so it would be good if the human can stay out of the loop entirely.

When setting consecutive_failures to any value higher than 1 (the default), kanla will inhibit delivering the plugin’s messages until that specific failure has occurred $consecutive_failures times.

Uniquely identifying a specific failure is the responsibility of the plugin, see [example_alert_output].

consecutive_failures example:

consecutive_failures = 2

<monitor i3wm website>
    plugin   = http
    url      = "http://www.i3wm.org/"
    # Effectively notifies me once www.i3wm.org
    # is down for at least 60 seconds.
    interval = 30
</monitor>

3.3.3. silenced_by

The silenced_by directive has two use cases:

  1. Inhibit all alerts when the internet connection is down.

  2. Only send a single alert when an entire machine fails.

The first use case can be accomplished by monitoring a very reliable target and using a global silenced_by setting on that target’s id:

silenced_by example 1:

consecutive_failures = 2
silenced_by = http.http://www.google.com

<monitor i3wm website>
    plugin = http
    url    = http://www.i3wm.org/
</monitor>

<monitor internet connection check>
    plugin = http
    url    = http://www.google.com
</monitor>
Note

To avoid race conditions, it is a good idea to use consecutive_failures = 2 and have interval settings that ensure the reliable site is checked at least as often as the other targets. Otherwise, the silenced_by setting might not have any effect, because the error for the reliable site might arrive after the error for the unreliable site.

3.4. Plugin configuration

There are some configuration directives which are valid for every plugin. These directives are:

plugin

Specifies which plugin should be used. This is simply the base filename of the plugin, e.g. "http".

send_alerts_to

Overwrites the main configuration’s send_alerts_to for this plugin, see [send_alerts_to]. Using variables, you can also amend instead of overwrite, see [variables].

interval

Controls how often the plugin verifies that things are good. This configuration directive is merely a convention, that is, plugins don’t have to use it (e.g. in case the plugin needs a more detailed interval configuration), but it is strongly recommended.

silenced_by

All plugin output is discarded while the specified id is present. See [silenced_by].

3.5. Plugin: http

The http plugin periodically retrieves the specified URL(s) and sends alerts if the HTTP status is not 200 or anything else went wrong during retrieval.

3.5.1. URL

The URL of the site to retrieve. This can be anything which the AnyEvent::HTTP module accepts. In addition it allows to embed username and password for HTTP basic access authentication. The standard URI syntax is used, e.g. http://user:[email protected]/.

You can specify this configuration directive multiple times.

Full URL example:

<monitor i3wm website>
    plugin = http
    url    = "http://www.i3wm.org/"
</monitor>

There is no default value for this option, therefore you have to configure this.

You can specify multiple URLs:

Multiple URL example:

<monitor i3wm website>
    plugin = http
    url    = http://www.i3wm.org/
    url    = http://www.i3-wm.org/
</monitor>

3.5.2. Family

The address family (IPv4, IPv6 or both) which should be used when retrieving the URL(s).

The default is to try both IPv4 and IPv6.

Full family example:

<monitor i3wm website>
    plugin = http
    family = ipv4 | ipv6
    url    = "http://www.i3wm.org/"
</monitor>

<monitor slashdot>
    plugin = http
    # slashdot doesn’t even support IPv6…
    family = ipv4
    url    = "http://slashdot.org/"
</monitor>
Note The absence of a AAAA DNS record is treated as an error (to catch DNS misconfiguration problems). Therefore, if your site is IPv4-only, you need to explicitly configure that. This is intentional — it’s 2013, your site should really be dual-stacked by now.

3.5.3. Body

You can make kanla verify the page it received via HTTP actually contains expected content. That is, sometimes webservers fail even though they return the HTTP 200 status, for example because a poorly written PHP script could not establish a database connection.

You can specify this option multiple times, all specified checks need to pass.

You can either specify a Perl regular expression (e.g. /foo/, it MUST be enclosed in forward slashes) or a custom verification function (MUST start with "sub {"), see the example below. Functions will be called with the full HTTP body string as first argument.

This option has no default value, meaning no checks on the body are performed.

Full body example:

<monitor i3wm website>
    plugin = http
    url    = "http://www.i3wm.org/"

    # Check for the intro text.
    body   = "/tiling window manager/"

    # Check for version 4.x in the page.
    body   = <<EOT
/4.\d<\/span>/
EOT
</monitor>

<monitor i3wm website>
    plugin = http
    url    = "http://www.i3wm.org/"

    # Stupid check for the length of the returned page.
    # Only good for demonstration purposes.
    # Note how variables need to be escaped because
    # otherwise they are interpolated.
    body   = <<EOT
sub {
    my (\$body) = @_;
    return (length(\$body) == 5137);
}
EOT
</monitor>

3.6. Plugin: smtp

The smtp plugin periodically connects to an SMTP server and sends alerts if the server does not send the SMTP greeting (220 …) within 20 seconds after connecting or anything else went wrong on the DNS/TCP level.

3.6.1. Host

The host (may include a port) of the smtp server to check. Every syntax which AnyEvent::Socket’s parse_hostport function understands is accepted (this covers all common IPv4 and IPv6 formats).

You can specify this configuration directive multiple times.

Full Host example:

<monitor i3 smtp>
    plugin = smtp
    host   = infra.in.zekjur.net
</monitor>

<monitor i3 smtp>
    plugin = smtp
    family = ipv6
    host   = "[2001:4d88:100e:1::2]:submission"
</monitor>

3.6.2. Family

3.6.3. Timeout

The timeout (in seconds) for each connection attempt (via TCP) and for receiving the greeting.

The default value is a timeout 20 seconds. This means it is okay if the TCP connection can be established after 9 seconds and the SMTP greeting is read after another 9 seconds.

Full timeout example:

<monitor i3 smtp>
    plugin   = smtp

    # Gee, we could really use some newer hardware.
    # Until then, cut the server some slack:
    # Check only every 5 minutes,
    # wait up to 60 seconds for connections
    # and for reading the greeting.
    timeout  = 60
    interval = 300

    host     = infra.in.zekjur.net
</monitor>

3.7. Plugin: irc

The irc plugin periodically logs onto an IRC network and sends alerts if the server does not let you log in properly (it looks for the "001" IRC message) or anything else went wrong on the DNS/TCP level.

3.7.1. ircd

The host (may include a port) of the IRC server to check. Every syntax which AnyEvent::Socket’s parse_hostport function understands is accepted (this covers all common IPv4 and IPv6 formats).

Full ircd example:

<monitor TWiCEiRC>
    plugin = irc
    ircd   = irc.twice-irc.de
</monitor>

3.7.2. Family

3.7.3. Timeout

3.8. Plugin: git

The git plugin periodically checks whether the specified git URL works correctly (by receiving the current HEAD revision) and sends alerts if anything comes back which does not look like a git HEAD revision or anything else went wrong on the DNS/TCP level.

3.8.1. url

A git URL of the git-daemon to check. Every syntax which AnyEvent::Socket’s parse_hostport function understands is accepted (this covers all common IPv4 and IPv6 formats).

Full url example:

<monitor i3 git>
    plugin = git
    url    = git://code.i3wm.org/i3
</monitor>

3.8.2. Family

3.8.3. Timeout

3.9. Plugin: redis

The redis plugin periodically connects to a redis server and sends alerts if the server does not respond to a PING command (with +PONG) or anything else went wrong on the DNS/TCP level.

3.9.1. Host

The host (may include a port) of the redis server to check. Every syntax which AnyEvent::Socket’s parse_hostport function understands is accepted (this covers all common IPv4 and IPv6 formats).

You can specify this configuration directive multiple times.

Full Host example:

<monitor my redis>
    plugin = redis
    host   = localhost:6379
</monitor>

3.9.2. Family

See [http-family]. Make sure to use family = ipv4 here as long as redis does not support IPv6.

4. Config recipes

4.1. Variables

When configuring many hosts which are alike, variables might come in handy. kanla enables Config::General::Interpolated, so you can use variables by refering to configuration directives with a dollar sign prefixed, e.g. $send_alerts_to evaluates to the contents of the send_alerts_to directive.

You are not confined to the directives which actually exist, but you can use any key/value pair you want. To avoid troubles, you should prefix your own directives with an underscore. kanla will never implement directives starting with an underscore.

Amending send_alerts_to:

# [email protected] gets ALL the alerts!
send_alerts_to = <<EOT
[email protected]
EOT

<monitor i3wm website>
    plugin         = http
    url            = "http://www.i3wm.org/"

    # Additionally, alert [email protected].
    send_alerts_to = <<EOT
    $send_alerts_to
    [email protected]
    EOT
</monitor>

Monitor multiple trac instances (/etc/kanla/default.cfg):

_trac_url = "bugs.i3wm.org"
include "default.d/http-trac.cfg"

_trac_url = "trac.edgewall.org"
include "default.d/http-trac.cfg"

Monitor multiple trac instances (/etc/kanla/default.d/http-trac.cfg):

<monitor trac at $_trac_url>
    plugin = http
    url    = $_trac_url
# TODO: content matching
</monitor>

5. Technical details

5.1. Message receipts

kanla supports jabber’s XEP-0184 message receipts. With message receipts, delivery of a message is confirmed end-to-end, that is, kanla can be sure your device received its message. More importantly, when your device does not confirm the message, kanla will retry (after 30 seconds) to send the message, until your device confirms the message.

You will immediately notice this feature when you configure multiple jabber accounts. In that case, kanla will try to deliver alerts using one jabber account first, and if that fails, it retries with the second one. In case you have more than two accounts, this is repeated until your device confirmed the message or there are no accounts left.

For send_alerts_to destinations which do not support XEP-0184, kanla will immediately send alerts from all jabber accounts. This leads to a storm of alerts on your device, but this is intended to keep things simple. Lobby your jabber client vendor to implement XEP-0184, it’s simple enough.

5.2. Jabber presences

Let’s assume you configure [email protected] via send_alerts_to and you are connected as [email protected] on your computer and on your smartphone. Jabber calls each of these connections a presence.

When sending an alert to [email protected], kanla will chose the presence with the highest priority. In case multiple presences have the same priority, one of them will be arbitrarily chosen.

By changing the priority in your jabber client(s), you can therefore influence where alerts should go.

5.3. Writing plugins

The recommended way to write plugins is to write them in Perl and use the Perl module Kanla::Plugin. This module sets up some useful imports, the signal_error function, configuration parsing and a main timer. The only thing you have to write yourself is a function called run, which will be invoked regularly.

Additionally, if the service you are monitoring works in a banner-like way (e.g. you connect, possibly write some data, then read some data), you can use Kanla::Plugin::Banner. This module establishes a connection when you call banner_connect and takes care of everything: resolving the host, connecting to the resolved addresses, raising errors on timeouts.

For an example of using these modules, check the “git” plugin.

Of course, plugins can be implemented in any language. See the next section for details.

5.4. Plugin interface

Plugins are executable files which run for an indefinite amount of time. When they exit, regardless of their exit status, they will be restarted after 2 seconds.

On stdin, plugins get their configuration. The plugin’s stdout and stderr is not interpreted by kanla.

On file descriptor 3, plugins send alerts, formatted as JSON.

5.4.1. Example configuration input

The configuration for plugins is dumped by Config::General.

Here is the http plugin’s example configuration as sent to the plugin via stdin:

Example configuration input:

plugin   http
family   ipv4
url   http://slashdot.org/

Given that the config is generated by Config::General, it is of course the best idea to use that module to read the configuration, too. However, this is not required and, especially if you are using a different language than Perl, parsing this key/value format using regular expressions should work fine.

5.4.2. Example alert output

Each message is represented by a JSON hash. Currently, only three keys are defined:

severity

The severity of the message. Note that kanla currently ignores this.

message

A free text message which will be sent to the configured send_alerts_to destination. Make this as helpful as possible, but keep it as verbose as necessary.

id

A unique id for this specific problem. Used in order to suppress messages unless the problem persists. Should start with the plugin name, e.g. http.http://i3wm.org/.404

Example alert JSON output:

{
 "severity": "critical",
 "message": "This will be sent via Jabber",
 "id": "doc.userguide.example"
}