Tw Bert
Tw Bert

Reputation: 3809

How to monitor the size of a directory via Telegraf

We need to monitor the size of a directory (for example the data directory of InfluxDB) to set up alerts in Grafana. As mentioned here: How to configure telegraf to send a folder-size to influxDB , there is no built-in plugin for this.

We don't mind using the inputs.exec section of Telegraf. The directories are not huge (low filecount + dircount), so deep scanning (like the use of du) is fine by us.

One of the directories we need to monitor is /var/lib/influxdb/data.

What would be a simple script to execute, and what are the caveats?

Upvotes: 9

Views: 12180

Answers (5)

nagylzs
nagylzs

Reputation: 4180

I'm posting a new answer because the other solutions did not consider the fact that telegraf runs under the telegraf user by default, and that user does not have permission to list files under most directories. Shell scripts cannot have the suid bit set, and all of the provided solutions (so far) either require the telegraf user to have access to all monitored directories, or run telegraf with a different user. All of these "solutions" pose security risks.

I have created a small project here https://github.com/nagylzs/dudir to overcome these problems. It contains instructions about how to use it.

  • The safest way is to hard-code the directory names into a new executable, set the suid bit and call it from telegraf.
  • The second safest way is to use the original version of the program, and pass directory names on the command line. It still increases security because you cannot read file contents or list directory contents; it only lets you get the directory size. In this case, I would do chown root:telegraf dudir and then chmod 4550 dudir.

Upvotes: 1

valodzka
valodzka

Reputation: 5805

It's possible natively with filecount plugin

[[inputs.filecount]]
directories = ["/var/lib/influxdb/engine/data"]

Output:

> filecount,directory=/var/lib/influxdb/engine/data,host=psg count=424i,size_bytes=387980393i 1652195855000000000

Upvotes: 9

Oliver Prislan
Oliver Prislan

Reputation: 330

The solutions already provided look good to me and highlighting the caveats such a read permission is great. An alternative worth mentioning is Using Telegraf to collect the data as proposed in monitor diskspace on influxdb with telegraf.

[[outputs.influxdb]]
  urls = ["udp://your_host:8089"]
  database = "telegraf_metrics"

  ## Retention policy to write to. Empty string writes to the default rp.
  retention_policy = ""
  ## Write consistency (clusters only), can be: "any", "one", "quorum", "all"
  write_consistency = "any"

  ## Write timeout (for the InfluxDB client), formatted as a string.
  ## If not provided, will default to 5s. 0s means no timeout (not recommended).
  timeout = "5s" 

# Read metrics about disk usage by mount point
[[inputs.disk]]
  ## By default, telegraf gather stats for all mountpoints.
  ## Setting mountpoints will restrict the stats to the specified mountpoints.
  # mount_points = ["/"]

  ## Ignore some mountpoints by filesystem type. For example (dev)tmpfs (usually
  ## present on /run, /var/run, /dev/shm or /dev).
  ignore_fs = ["tmpfs", "devtmpfs"]

Note: the timeout should be considered carefully. Maybe hourly readings would be sufficient to avoid exhaustion by logging.

Upvotes: 1

JGC
JGC

Reputation: 6373

If you need to monitor multiple directories I updated the answer by Tw Bert and extended it to allow you to pass them all on one command line. This saves you having to add multiple [[input.exec]] entries into your telegraf.conf file.

Create the file /etc/telegraf/scripts/disk-usage.sh containing:

#!/bin/bash

echo "["
du -ks "$@" | awk '{if (NR!=1) {printf ",\n"};printf "  { \"directory_size_kilobytes\": "$1", \"path\": \""$2"\" }";}'
echo
echo "]"

I want to monitor two directories: /mnt/user/appdata/influxdb and /mnt/user/appdata/grafana. I can do something like this:

# Get disk usage for multiple directories
[[inputs.exec]]
  commands = [ "/etc/telegraf/scripts/disk-usage.sh /mnt/user/appdata/influxdb /mnt/user/appdata/grafana" ]
  timeout = "5s"
  name_override = "du"
  name_suffix = ""
  data_format = "json"
  tag_keys = [ "path" ]

Once you've updated your config, you can test this with:

telegraf --debug --config /etc/telegraf/telegraf.conf --input-filter exec --test

Which should show you what Telegraf will push to influx:

bash-4.3# telegraf --debug --config /etc/telegraf/telegraf.conf --input-filter exec --test
> du,host=SomeHost,path=/mnt/user/appdata/influxdb directory_size_kilobytes=80928 1536297559000000000
> du,host=SomeHost,path=/mnt/user/appdata/grafana directory_size_kilobytes=596 1536297559000000000

Upvotes: 9

Tw Bert
Tw Bert

Reputation: 3809

You could create a simple bash script metrics-exec_du.sh with the following content (chmod 755):

#!/usr/bin/env bash
du -bs "${1}" | awk '{print "[ { \"bytes\": "$1", \"dudir\": \""$2"\" } ]";}'

And activate it by putting the following in the Telegraf config file:

[[inputs.exec]] commands = [ "YOUR_PATH/metrics-exec_du.sh /var/lib/influxdb/data" ] timeout = "5s" name_override = "du" name_suffix = "" data_format = "json" tag_keys = [ "dudir" ]

Caveats:

  1. The du command can stress your server, so use with care
  2. The user telegraf must be able to scan the dirs. There are several options, but since InfluxDB's directory mask is a bit unspecified (see: https://github.com/influxdata/influxdb/issues/5171#issuecomment-306419800), we applied a rather crude workaround (examples are for Ubuntu 16.04.2 LTS):
    • Add the influxdb group to the user telegraf : sudo usermod --groups influxdb --append telegraf
    • Put the following in the crontab, run for example each 10 minutes: 10 * * * * chmod -R g+rX /var/lib/influxdb/data > /var/log/influxdb/chmodfix.log 2>&1

Result, configured in Grafana (data source: InfluxDB): Grafana dirsize monitoring

Cheers, TW

Upvotes: 11

Related Questions