Reputation: 11561
I'm trying to enter a stream of rare (<1/minute) metrics and be able to query it many hours in the past. Unfortunately, I can't see beyond 6 hours despite trying the usual tricks I could find in the Google. What's wrong with my configuration? Here are the files I use to set up my environment:
./storage-aggregation.conf
[min]
pattern = \.lower$
xFilesFactor = 0
aggregationMethod = min
[max]
pattern = \.upper(_\d+)?$
xFilesFactor = 0
aggregationMethod = max
[sum]
pattern = \.sum$
xFilesFactor = 0
aggregationMethod = sum
[count]
pattern = \.count$
xFilesFactor = 0
aggregationMethod = sum
[count_legacy]
pattern = ^stats_counts.*
xFilesFactor = 0
aggregationMethod = sum
[default_average]
pattern = .*
xFilesFactor = 0
aggregationMethod = average
./docker-compose.yml
version: '3.3'
services:
graphite:
image: graphiteapp/graphite-statsd
container_name: 'graphite'
ports:
- '2003:2003'
volumes:
- ./persistence/graphite/storage:/opt/graphite/storage
- ./storage-aggregation.conf:/opt/graphite/conf/storage-aggregation.conf
- ./storage-schemas.conf:/opt/graphite/conf/storage-schemas.conf
grafana:
build: './grafana'
ports:
- '3000:3000'
links:
- graphite
./storage-schemas.conf
[carbon]
pattern = ^carbon\.
retentions = 10s:6h,1m:90d
[default_1min_for_1day]
pattern = .*
retentions = 10s:1800d,1m:1800d,10m:1800d
./grafana/provisioning/datasources/all.yml
datasources:
- name: 'graphite'
type: 'graphite'
access: 'proxy'
org_id: 1
url: 'http://graphite:8080'
is_default: true
version: 1
editable: true
./grafana/provisioning/dashboards/all.yml
- name: 'default'
org_id: 1
folder: ''
type: 'file'
options:
folder: '/var/lib/grafana/dashboards'
./grafana/Dockerfile
FROM grafana/grafana:7.0.0
ADD ./provisioning /etc/grafana/provisioning
ADD ./config.ini /etc/grafana/config.ini
ADD ./dashboards /var/lib/grafana/dashboards
USER 0
RUN chmod a+w /var/lib/grafana -R /etc/grafana/config.ini
USER 472
./grafana/config.ini
[paths]
provisioning = /etc/grafana/provisioning
[server]
enable_gzip = true
[users]
default_theme = light
The dashboard is pretty much a default. What am I missing here?
Upvotes: 0
Views: 452
Reputation: 2176
Your retentions specify that the raw interval is 10s, but you're sending data less than every minute. This means that the raw retention will look like 0s,<value>; 10s, null; 20s, null; 30s, null; 40s, null; 50s, null; 60s, <value maybe, but could also be null>
You have XFF set to 0, which means that the rollup to 1 minute requires 6 non-null raw values. You only have 1, so it rolls up to null
.
You should consider updating your raw retention to longer than 10s, and if you want to propagate the value even though you have a ton of nulls then set XFF to 0.9 (which will allow the next aggregation to accept a value if at least 10% of the lower intervals are known.
Finally, your 10s:1800d,1m:1800d,10m:1800d
setting doesn't make sense because the lower retentions will never be used (as they all cover 1800d), if you really want raw data for 1800d then you can just use 10s:1800d
, but that will still result in a huge and unwieldy file. I'd suggest a more reasonable schedule (low interval = short retention, higher intervals = longer retention, the total size of your whisper file is going to be the sum of retention / interval for each level of aggregation, and graphite will always pick the first retention that covers the query period) combined with XFF values that match your expectation for how rollups should handle null values.
Upvotes: 1