Collectd and Kibana experiences

Jun. 25, 2015


Lately I’ve been testing a way display system metrics in Kibana 4, the reason for this, is because rather than have information and graphs from pure logs and graphs in graphite, I would like to have a centralized place where I can: Visualize logs, Visualize metrics data from various sources (and specially collectd) as well as searching log entries.

When Kibana4 was released its most visual changes was in the way you could present and create graphs, great.

While my previous setup consisting of collectd and graphite, while storing the graphs in a Times Series Database I was eager to know if Elasticsearch was suited for storing timeseries data, reading up on TSDB’s and the various comments on forums people tend to lean against that it is not a good idea, while not having a good explanation for this. The Elasticsearch developer I talked with (maybe unsurprisingly) informed me that it should not be a problem.

I started of with sending collectd directly to Logstash with the UDP plugin on the logstash server.

udp { port => 25826 buffer_size => 1452 codec => collectd { authfile => "/usr/local/etc/collectd.auth" typesdb => "/etc/types.db" security_level => "Encrypt" } type => "collectd" }

While testing I only included a single host to monitor to see if how it would look like

An event is stored in elasticsearch like this:

{ "_index": "logstash-2015.08.27", "_type": "collectd", "_id": "AU9woJq6imCBkq4M2M2D", "_score": null, "_source": { "host": "mysql01", "@timestamp": "2015-08-27T19:12:31.720Z", "plugin": "mysql", "plugin_instance": "innodb", "type_instance": "file_writes", "collectd_type": "counter", "value": 955425, "@version": "1", "type": "collectd" }, "fields": { "@timestamp": [ 1440702751720 ] }, "highlight": { "type": [ "@kibana-highlighted-field@collectd@/kibana-highlighted-field@" ], "type.raw": [ "@kibana-highlighted-field@collectd@/kibana-highlighted-field@" ] }, "sort": [ 1440702751720 ] }

Great, collectd clients successfully authenticates to the UDP input on logstash, and the data is then shipped to logstash. Success!

Scaling up a bit, I like to gather as much information as possible from each host and all its services included, while running my testing on a single node running a simple LAMP stack, where I gather as much of the system level details, almost every value in mysql and information from mod_status for Apache, about 60 metrics are stored for this host alone. I let collectd update every 30 second. This works fine on a modest 8GB RAM, 100GB storage and 4vCPU’s on a XEN virtualized server.

Now I want to see if it handles a bit more, I add some more hosts, with various new services to monitor, haproxy, nginx, php-fpm, mongodb, beanstalkd, rabbitmq and memcached.

Now collecting about 11 000 metrics every 30 second without breaking a sweat. Elasticsearch seems to do the job great at this scale, I have yet to test more data, but so far it looks good.

Below is various components in a application gathered in the dashboard view in Kibana 4.


Caveats to be aware of:

Storing metrics in a resolution of this type with updates every 30 second will create some amount of data, with 11 000 metrics indexed every 30second, the storage use is about 1.7GB every 24 hour, not a big problem if you have a good amount of storage and appreciate the details. A possible workaround is to store data in time based indices and create a retention scheme based on this.

Creating the graphs in Kibana 4 can be time consuming, its no simple way to display a graph, however – when you get used to do it its goes fast, and you can create a sample dashboard displaying the common services on a linux system displaying memory, cpu, hdd, entropy, ntp information and sort by host.