Reputation: 2231
I'm using telegraf, influxdb and grafana to make a monitoring system for a distributed application. The first thing I want to do is to count the number of java process running on a machine.
But when I make my request, the number of process is nearly random (always between 1 and 8 instead of always having 8).
I think there is a mistake in my telegraf configuration but i don't see where.. I tried to change interval
but nothing was different : it seems influxdb doesn't have all the data.
I'm running centos 7 and Telegraf v1.5.0 (git: release-1.5 a1668bbf)
All Java process I want to count :
[root@localhost ~]# pgrep -f java
10665
10688
10725
10730
11104
11174
16298
22138
My telegraf.conf :
[global_tags]
# Configuration for telegraf agent
[agent]
interval = "5s"
round_interval = true
metric_batch_size = 1000
metric_buffer_limit = 10000
collection_jitter = "0s"
flush_interval = "10s"
flush_jitter = "0s"
precision = ""
debug = true
quiet = false
logfile = "/var/log/telegraf/telegraf.log"
hostname = "my_server"
omit_hostname = false
My input.conf :
# Read metrics about disk usagee
[[inputs.disk]]
fielddrop = [ "inodes*" ]
mount_points=["/", "/workspace"]
# File
[[inputs.filestat]]
files = ["myfile.log"]
# Read the number of running java process
[[inputs.procstat]]
user = "root"
pattern = "java"
My request :
The response :
Upvotes: 1
Views: 3514
Reputation: 38
If you just want to count PID, it's a good way to use exec
like this :
[[inputs.exec]]
commands = ["pgrep -c java"] #command to execute
name_override = "the_name" #database's name
data_format = "my_value" #colunm's name
For commands
, use pgrep -c java
without option -f
because it's "full" and also counts the command pgrep
(and you have almost the same problem as if you use procstat).
Solution found here
Upvotes: 2
Reputation: 19337
With pattern matching, if it matches multi pids, multi data points are generated with identical tags and timestamp. When these points are sent to influxdb, only the last point is stored.
Example of what may happen with your configuration:
00:00 => pid 1
00:05 => pid 2
00:10 => pid 1
00:15 => pid 5
00:20 => pid 7
00:25 => pid 3
00:30 => pid 3
00:35 => pid 4
00:40 => pid 6
00:45 => pid 7
00:50 => pid 6
00:55 => pid 5
Different pids over one minute = 7 (pid 8 was not stored a single time)
Since it's random, you sometimes hit the 8 different pids in a minute, but most of the time you don't.
To differentiate between processes whose tags are otherwise the same, use pid_tag = true
:
[[inputs.procstat]]
user = "root"
pattern = "java"
pid_tag = true
However, if you just want to count the number of processes (and don't care about the stats), just use the exec plugin with a custom command like pgrep -c -f java
. This will be more optimized than having multiples time series (with pid_tag you end up with one per pid).
Upvotes: 1