Reputation: 1717
I have metric from Grafana Loki named logs_bytes_over_time
and have two labels:
1m
All services have some retention and app1
and app2
has retention 336 hours (14 days). When count sum over time, the result show 8gb
for app1 and 5gb
for app2:
curl http://victoria-metrics:8428/prometheus/api/v1/query -d 'query=sum_over_time(logs_bytes_over_time{interval="1m", service=~"app.*"}[14d])' | jq .
{
"status": "success",
"data": {
"resultType": "vector",
"result": [
{
"metric": {
"interval": "1m",
"service": "app1"
},
"value": [
1672315519,
"8593470346"
]
},
{
"metric": {
"interval": "1m",
"service": "app2"
},
"value": [
1672315519,
"5498422093"
]
}
]
}
}
When I get metrics value from Victoria Metrics API since last 336 hours, I received different values:
42.33gb
20.87gb
The result is from samples downloaded from Victoria metrics API:
curl http://victoria-metrics:8428/prometheus/api/v1/query_range -d 'query=logs_bytes_over_time{interval="1m", service=~"app.*"}' -d 'start=Xh' -d 'stop=Yh' -d 'step=1m'
Where X and Y are iterate over 24h interval to make requests easier for Victoria Metrics. This is pairs of X and Y I iterate over:
[('-336h', '-312h'), ('-312h', '-288h'), ('-288h', '-264h'), ('-264h', '-240h'), ('-240h', '-216h'), ('-216h', '-192h'), ('-192h', '-168h'), ('-168h', '-144h'), ('-144h', '-120h'), ('-120h', '-96h'), ('-96h', '-72h'), ('-72h', '-48h'), ('-48h', '-24h'), ('-24h', '0')]
What I do is just sum all the values I received. I sum it with python and bash to be sure, I did not make any mistake in script, the results are the same.
Why the sum of values from Victoria API and from query sum_over_time
are so different? I would expect the result should be the same, or at least much closer to each other.
Upvotes: 1
Views: 2775
Reputation: 17800
The /api/v1/query_range doesn't return raw samples stored in VictoriaMetrics. It returns calculated values at timestamps t=[start, start+step, start+2*step, ..., end]
. More specifically, it returns the last raw sample value on a time range (t-scrape_interval ... t]
per each timestamp t
from the list above, where scrape_interval
is the median interval between raw samples. Note that the t-scrape_interval
isn't included in the time range, while t
is included. See these docs for more details.
The sum_over_time(m[d])
returns the sum of raw samples on the time range (t-d ... t]
when queried at the timestamp t
. See these docs for more details.
It is likely the interval between raw samples in the queried time series exceeds the step
value passed to /api/v1/query_range
. This results in duplicate output values per each raw sample stored in VictoriaMetrics.
VictoriaMetrics provides export APIs, which can be used for exporting raw samples for the given time series - see these docs and this article for details. Try exporting raw samples with these APIs and verifying whether the sum of raw samples matches the value returned by sum_over_time(m[d])
.
Upvotes: 3