jellineksara
jellineksara

Reputation: 11

Prometheus stale metrics with remote-write-receiver vs active scraping

I'd have a question related to the different behaviors I observe when I use Prometheus only as database (remote-write-receiver enabled) vs as a metric collector service (Prometheus actively scrapes an endpoint).

I have two dummy setups (as docker containers):

  1. Prometheus (v2.40.0) is configured to scrape a Fluent bit (v2.0.3) service's prometheus_exporter output.
  2. Prometheus (v2.40.0) is configured with --enable-remote-write-receiver flag and similarly, a Fluent bit (v2.0.3) writes the same data as in setup 1. to the Prometheus's remote write endpoint.

When I stop Fluent bit in setup 1. and I plot the Graph of a selected metric, I see that the graph breaks at the time point where Fluent bit was stopped. However, in setup 2 the same actions result in Prometheus still drawing the graph returning the last received value for 5 more minutes.

If I understand correctly, what happens in setup 2. is the expected behavior in case a metric goes stale. However, according to my understanding, this should be the expected behavior in setup 1. as well, since I haven't reconfigured the query.lookback-delta in either setups.

I tried reading documentations, but I cannot find a clear explanation to this difference, though this might be a result of my lack of domain knowledge in Prometheus. :(

I would really appreciate if anyone could help me understand the differences that might have caused these distinct behaviors. I'm sorry if this is a dummy question, I'm just starting to get acquainted to Prometheus.

Upvotes: 1

Views: 598

Answers (1)

markalex
markalex

Reputation: 13351

Please look at official documentation about staleness.

Your case is covered by this sentence:

If a target scrape or rule evaluation no longer returns a sample for a time series that was previously present, that time series will be marked as stale.

In your first case, Prometheus tried to scrape data from target, it didn't return metrics (because it didn't respond), and Prometheus automatically marked all time series related to this target as stale.

In your second case Prometheus cannot apply staleness check, because it doesn't have any input for this, and as result it stops returning results only after lookback-delta expires.

Upvotes: 0

Related Questions