Grafana Timeout while querying large amount of logs from Loki

Question

I have a Loki server running on AWS Graviton (arm, 4 vCPU, 8 GiB) configured as following:

common:
  replication_factor: 1
  ring:
    kvstore:
      store: etcd
      etcd:
        endpoints: ['127.0.0.1:2379']

storage_config:
  boltdb_shipper:
   active_index_directory: /opt/loki/index
   cache_location: /opt/loki/index_cache
   shared_store: s3

  aws:
    s3: s3://ap-south-1/bucket-name

limits_config:
  enforce_metric_name: false
  reject_old_samples: true
  reject_old_samples_max_age: 168h # 7d
  ingestion_rate_mb: 10
  ingestion_burst_size_mb: 20
  per_stream_rate_limit: 8MB
  
ingester:
  lifecycler:
    join_after: 30s
  chunk_block_size: 10485760

compactor:
  working_directory: /opt/loki/compactor
  shared_store: s3
  compaction_interval: 5m

schema_config:
  configs:
    - from: 2022-01-01
      store: boltdb-shipper
      object_store: s3
      schema: v11
      index:
        prefix: loki_
        period: 24h

table_manager:
  retention_period: 360h #15d
  retention_deletes_enabled: true
  index_tables_provisioning: # unused
    provisioned_write_throughput: 500
    provisioned_read_throughput: 100
    inactive_write_throughput: 1
    inactive_read_throughput: 100

Ingestion is working fine and I'm able to query logs for long durations from streams with less data sizes. I'm also able to query small durations of logs for streams with TiBs of data.

I see the following error in Loki when I try to query 24h of data from a large data stream and Grafana timeout after 5 mins:

Feb 11 08:27:32 loki-01 loki[19490]: level=error ts=2022-02-11T08:27:32.186137309Z caller=retry.go:73 org_id=fake msg="error processing request" try=2 err="context canceled"
Feb 11 08:27:32 loki-01 loki[19490]: level=info ts=2022-02-11T08:27:32.186304708Z caller=metrics.go:92 org_id=fake latency=fast query="{filename=\"/var/log/server.log\",host=\"web-199\",ip=\"192.168.20.239\",name=\"web\"} |= \"attachDriver\"" query_type=filter range_type=range length=24h0m0s step=1m0s duration=0s status=499 limit=1000 returned_lines=0 throughput=0B total_bytes=0B
Feb 11 08:27:32 loki-01 loki[19490]: level=info ts=2022-02-11T08:27:32.23882892Z caller=metrics.go:92 org_id=fake latency=slow query="{filename=\"/var/log/server.log\",host=\"web-199\",ip=\"192.168.20.239\",name=\"web\"} |= \"attachDriver\"" query_type=filter range_type=range length=24h0m0s step=1m0s duration=59.813829694s status=400 limit=1000 returned_lines=153 throughput=326MB total_bytes=20GB
Feb 11 08:27:32 loki-01 loki[19490]: level=error ts=2022-02-11T08:27:32.238959314Z caller=scheduler_processor.go:199 org_id=fake msg="error notifying frontend about finished query" err="rpc error: code = Canceled desc = context canceled" frontend=192.168.5.138:9095
Feb 11 08:27:32 loki-01 loki[19490]: level=error ts=2022-02-11T08:27:32.23898877Z caller=scheduler_processor.go:154 org_id=fake msg="error notifying scheduler about finished query" err=EOF addr=192.168.5.138:9095

Query: {filename="/var/log/server.log",host="web-199",ip="192.168.20.239",name="web"} |= "attachDriver"

Is there a way to stream the results instead of waiting for the response? Can I optimize Loki to process such queries better?

Grafana Timeout while querying large amount of logs from Loki

Answers (1)

Related Questions