Reputation: 11
I'd have a question related to the different behaviors I observe when I use Prometheus only as database (remote-write-receiver enabled) vs as a metric collector service (Prometheus actively scrapes an endpoint).
I have two dummy setups (as docker containers):
When I stop Fluent bit in setup 1. and I plot the Graph of a selected metric, I see that the graph breaks at the time point where Fluent bit was stopped. However, in setup 2 the same actions result in Prometheus still drawing the graph returning the last received value for 5 more minutes.
If I understand correctly, what happens in setup 2. is the expected behavior in case a metric goes stale. However, according to my understanding, this should be the expected behavior in setup 1. as well, since I haven't reconfigured the query.lookback-delta in either setups.
I tried reading documentations, but I cannot find a clear explanation to this difference, though this might be a result of my lack of domain knowledge in Prometheus. :(
I would really appreciate if anyone could help me understand the differences that might have caused these distinct behaviors. I'm sorry if this is a dummy question, I'm just starting to get acquainted to Prometheus.
Upvotes: 1
Views: 598
Reputation: 13351
Please look at official documentation about staleness.
Your case is covered by this sentence:
If a target scrape or rule evaluation no longer returns a sample for a time series that was previously present, that time series will be marked as stale.
In your first case, Prometheus tried to scrape data from target, it didn't return metrics (because it didn't respond), and Prometheus automatically marked all time series related to this target as stale.
In your second case Prometheus cannot apply staleness check, because it doesn't have any input for this, and as result it stops returning results only after lookback-delta
expires.
Upvotes: 0