Emrys Myrooin
Emrys Myrooin

Reputation: 2231

Count the number of running process with Telegraf

I'm using telegraf, influxdb and grafana to make a monitoring system for a distributed application. The first thing I want to do is to count the number of java process running on a machine.

But when I make my request, the number of process is nearly random (always between 1 and 8 instead of always having 8).

I think there is a mistake in my telegraf configuration but i don't see where.. I tried to change interval but nothing was different : it seems influxdb doesn't have all the data.

I'm running centos 7 and Telegraf v1.5.0 (git: release-1.5 a1668bbf)

All Java process I want to count :

[root@localhost ~]# pgrep -f java
10665
10688
10725
10730
11104
11174
16298
22138

My telegraf.conf :

[global_tags]

# Configuration for telegraf agent
[agent]
  interval = "5s"
  round_interval = true
  metric_batch_size = 1000
  metric_buffer_limit = 10000
  collection_jitter = "0s"
  flush_interval = "10s"
  flush_jitter = "0s"
  precision = ""
  debug = true
  quiet = false
  logfile = "/var/log/telegraf/telegraf.log"
  hostname = "my_server"
  omit_hostname = false

My input.conf :

# Read metrics about disk usagee
[[inputs.disk]]
  fielddrop = [ "inodes*" ]
  mount_points=["/", "/workspace"]                                                                                                                                                                                                                                  

# File
[[inputs.filestat]]
  files = ["myfile.log"]

# Read the number of running java process
[[inputs.procstat]]
  user = "root"
  pattern = "java"

My request :

request

The response :

response

Upvotes: 1

Views: 3514

Answers (2)

Bugsyaya
Bugsyaya

Reputation: 38

If you just want to count PID, it's a good way to use exec like this :

[[inputs.exec]]
  commands = ["pgrep -c java"] #command to execute
  name_override = "the_name"   #database's name
  data_format = "my_value"     #colunm's name

For commands, use pgrep -c java without option -f because it's "full" and also counts the command pgrep (and you have almost the same problem as if you use procstat).

Solution found here

Upvotes: 2

KeatsPeeks
KeatsPeeks

Reputation: 19337

With pattern matching, if it matches multi pids, multi data points are generated with identical tags and timestamp. When these points are sent to influxdb, only the last point is stored.

Example of what may happen with your configuration:

00:00 => pid 1
00:05 => pid 2
00:10 => pid 1
00:15 => pid 5
00:20 => pid 7
00:25 => pid 3
00:30 => pid 3
00:35 => pid 4
00:40 => pid 6
00:45 => pid 7
00:50 => pid 6
00:55 => pid 5
Different pids over one minute = 7 (pid 8 was not stored a single time)

Since it's random, you sometimes hit the 8 different pids in a minute, but most of the time you don't.


To differentiate between processes whose tags are otherwise the same, use pid_tag = true :

[[inputs.procstat]]
  user = "root"
  pattern = "java"
  pid_tag = true

However, if you just want to count the number of processes (and don't care about the stats), just use the exec plugin with a custom command like pgrep -c -f java. This will be more optimized than having multiples time series (with pid_tag you end up with one per pid).

Upvotes: 1

Related Questions