Reputation: 315
In my setup, there is a ESP32 device measuring temperature. To save battery capacity, it takes the value every 5 minutes and goes to deep sleep in between. The temperature value is published to a MQTT topic (devices/terasa/shield
) right after the measurement is taken. For some reason, sometimes, the MQTT connect/publish operation fails, which means there are periods of time, up to 30-ish minutes, where the MQTT topic does not receive any message from the device.
To collect these messages, prometheus-mqtt-exporter is used with the following setup:
mqtt:
# The MQTT broker to connect to
server: tcp://localhost:1883
# The Topic path to subscribe to. Be aware that you have to specify the wildcard.
topic_path: devices/#
# Optional: Regular expression to extract the device ID from the topic path. The default regular expression, assumes
# that the last "element" of the topic_path is the device id.
# The regular expression must contain a named capture group with the name deviceid
# For example the expression for tasamota based sensors is "tele/(?P<deviceid>.*)/.*"
device_id_regex: "(.*/)?(?P<deviceid>.*)"
# The MQTT QoS level
qos: 0
cache:
# Timeout. Each received metric will be presented for this time if no update is send via MQTT.
# Set the timeout to -1 to disable the deletion of metrics from the cache. The exporter presents the ingest timestamp
# to prometheus.
timeout: 60m
# This is a list of valid metrics. Only metrics listed here will be exported
metrics:
-
# The name of the metric in prometheus
prom_name: temperature
# The name of the metric in a MQTT JSON message
mqtt_name: temperature
# The prometheus help text for this metric
help: temperature reading
# The prometheus type for this metric. Valid values are: "gauge" and "counter"
type: gauge
The prometheus itself scrapes the values from the prometheus-mqtt-exporter, the configuration looks like this:
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
scrape_configs:
- job_name: mqtt
# The MQTT based sensor publish the data only now and then.
# scrape_interval: 300s
# If prometheus-mqtt-exporter is installed, grab metrics from external sensors.
static_configs:
- targets: ['localhost:9641']
To display the latest temperature value, Prometheus API is used like so:
$ curl -s -G --data-urlencode 'query=temperature{sensor="shield"}' http://localhost:9090/api/v1/query | jq
Normally this returns the temperature value, however after some time (there does not seem to be direct correlation) after the device failed to publish the MQTT message for given 5 minute interval, the API query returns empty result:
{
"status": "success",
"data": {
"resultType": "vector",
"result": []
}
}
The prometheus-mqtt-exporter configuration above has the 60 minute for returning cached results so I'd expect this should avoid the empty results however it is not the case.
At the point when Prometheus API returns empty result, I scrape the values from the prometheus-mqtt-exporter using curl http://localhost:9641/metrics
and it returns the value just fine.
How to avoid the empty results in such a scenario ? I tried overriding the scape interval for the mqtt job in Prometheus config however it does not seem to have the desired effect.
Upvotes: 2
Views: 1618
Reputation: 17800
Just wrap the temperature{sensor="shield"}
into last_over_time function with the lookbehind window in square brackets, which exceeds the maximum expected interval between raw samples sent from the sensor. For example, max_over_time(temperature{sensor="shield"}[1h])
would return the last sample value for up to one hour since the sample has been written to Prometheus.
In this case it is OK to increase the scrape_interval
for scraping the mqtt_exporter from 15s to 5m. This will reduce the number of temperature samples stored in Prometheus by 5m/15s = 20
times.
Try also increasing the scrape_timeout
in Prometheus config from the default 10s to something like 1m. This may help preserving scraped samples in case mqqt_exporter responds slowly.
P.S. It may be better pushing the collected measurements directly from sensors to a time series database, which supports data push. This would increase the reliability of the system by removing two moving parts from data delivery path - mqtt and mqtt_exporter. For example, you can push measurements directly to VictoriaMetrics - Prometheus-like monitoring system I work on. It supports widely used data push protocols - see these docs.
See also vmagent, which can help collecting and buffering data at the edge when there is no reliable connection to a centralized monitoring system. See these docs for details.
Upvotes: 3