Reputation: 3809
We need to monitor the size of a directory (for example the data directory of InfluxDB) to set up alerts in Grafana. As mentioned here: How to configure telegraf to send a folder-size to influxDB , there is no built-in plugin for this.
We don't mind using the inputs.exec
section of Telegraf. The directories are not huge (low filecount + dircount), so deep scanning (like the use of du
) is fine by us.
One of the directories we need to monitor is /var/lib/influxdb/data
.
What would be a simple script to execute, and what are the caveats?
Upvotes: 9
Views: 12180
Reputation: 4180
I'm posting a new answer because the other solutions did not consider the fact that telegraf runs under the telegraf
user by default, and that user does not have permission to list files under most directories. Shell scripts cannot have the suid bit set, and all of the provided solutions (so far) either require the telegraf user to have access to all monitored directories, or run telegraf with a different user. All of these "solutions" pose security risks.
I have created a small project here https://github.com/nagylzs/dudir to overcome these problems. It contains instructions about how to use it.
chown root:telegraf dudir
and then chmod 4550 dudir
.Upvotes: 1
Reputation: 5805
It's possible natively with filecount plugin
[[inputs.filecount]]
directories = ["/var/lib/influxdb/engine/data"]
Output:
> filecount,directory=/var/lib/influxdb/engine/data,host=psg count=424i,size_bytes=387980393i 1652195855000000000
Upvotes: 9
Reputation: 330
The solutions already provided look good to me and highlighting the caveats such a read permission is great. An alternative worth mentioning is Using Telegraf to collect the data as proposed in monitor diskspace on influxdb with telegraf.
[[outputs.influxdb]]
urls = ["udp://your_host:8089"]
database = "telegraf_metrics"
## Retention policy to write to. Empty string writes to the default rp.
retention_policy = ""
## Write consistency (clusters only), can be: "any", "one", "quorum", "all"
write_consistency = "any"
## Write timeout (for the InfluxDB client), formatted as a string.
## If not provided, will default to 5s. 0s means no timeout (not recommended).
timeout = "5s"
# Read metrics about disk usage by mount point
[[inputs.disk]]
## By default, telegraf gather stats for all mountpoints.
## Setting mountpoints will restrict the stats to the specified mountpoints.
# mount_points = ["/"]
## Ignore some mountpoints by filesystem type. For example (dev)tmpfs (usually
## present on /run, /var/run, /dev/shm or /dev).
ignore_fs = ["tmpfs", "devtmpfs"]
Note: the timeout should be considered carefully. Maybe hourly readings would be sufficient to avoid exhaustion by logging.
Upvotes: 1
Reputation: 6373
If you need to monitor multiple directories I updated the answer by Tw Bert and extended it to allow you to pass them all on one command line. This saves you having to add multiple [[input.exec]]
entries into your telegraf.conf file.
Create the file /etc/telegraf/scripts/disk-usage.sh
containing:
#!/bin/bash
echo "["
du -ks "$@" | awk '{if (NR!=1) {printf ",\n"};printf " { \"directory_size_kilobytes\": "$1", \"path\": \""$2"\" }";}'
echo
echo "]"
I want to monitor two directories: /mnt/user/appdata/influxdb
and /mnt/user/appdata/grafana
. I can do something like this:
# Get disk usage for multiple directories
[[inputs.exec]]
commands = [ "/etc/telegraf/scripts/disk-usage.sh /mnt/user/appdata/influxdb /mnt/user/appdata/grafana" ]
timeout = "5s"
name_override = "du"
name_suffix = ""
data_format = "json"
tag_keys = [ "path" ]
Once you've updated your config, you can test this with:
telegraf --debug --config /etc/telegraf/telegraf.conf --input-filter exec --test
Which should show you what Telegraf will push to influx:
bash-4.3# telegraf --debug --config /etc/telegraf/telegraf.conf --input-filter exec --test
> du,host=SomeHost,path=/mnt/user/appdata/influxdb directory_size_kilobytes=80928 1536297559000000000
> du,host=SomeHost,path=/mnt/user/appdata/grafana directory_size_kilobytes=596 1536297559000000000
Upvotes: 9
Reputation: 3809
You could create a simple bash script metrics-exec_du.sh
with the following content (chmod 755):
#!/usr/bin/env bash
du -bs "${1}" | awk '{print "[ { \"bytes\": "$1", \"dudir\": \""$2"\" } ]";}'
And activate it by putting the following in the Telegraf config file:
[[inputs.exec]]
commands = [ "YOUR_PATH/metrics-exec_du.sh /var/lib/influxdb/data" ]
timeout = "5s"
name_override = "du"
name_suffix = ""
data_format = "json"
tag_keys = [ "dudir" ]
Caveats:
du
command can stress your server, so use with caretelegraf
must be able to scan the dirs. There are several options, but since InfluxDB's directory mask is a bit unspecified (see: https://github.com/influxdata/influxdb/issues/5171#issuecomment-306419800), we applied a rather crude workaround (examples are for Ubuntu 16.04.2 LTS
):
influxdb
group to the user telegraf
: sudo usermod --groups influxdb --append telegraf
10 * * * * chmod -R g+rX /var/lib/influxdb/data > /var/log/influxdb/chmodfix.log 2>&1
Result, configured in Grafana (data source: InfluxDB):
Cheers, TW
Upvotes: 11