Slow aggregation query and high cpu consumption in influxDB

Question

We have a very annoying issue with InfluxDB (no clustering). Our InfluxDB version is 1.7.

As part of our business requirements, we are storing points (or data) of events. And from time to time we have peaks (~20 millions events in a day).

We have 2 keys of tags - tag1, tag2 and key value - value1.

Our cardinality after running:

SHOW SERIES CARDINALITY ON db1

is 24.

And specifically for this measurement 'measurement1' with the 20 million points we have only 1 series (we have only one pair of values of keys tag1 and tag2):

measurement1,tag1=tag1value,tag2=tag2value

Now, as I said if for a selected period we can have up to 20 millions points.

SELECT SUM(someDoubleValue) AS result 
FROM measurement1 
WHERE time > '2019-04-15T21:00:00Z' 
  AND time < '2019-05-17T20:59:59.999Z' 
  AND (tag1 = '1234567') 
GROUP BY time(30d, 21h) FILL(0);

This query is returned after 6-7 seconds but it consumes 80-100% of our cpu.

Now the cardinality is low (perhaps I may be wrong - is there another way to validate that our cardinality is low?).

Another issue that we want to fix is that in some of our flows, we can have 2-3 of these queries to be called on parallel. This of course results in a failure (timeout of our client) for all of them.

We tried to figure out if it was a sharding issue - we are using a default shard of 1 week so we don't think that querying 2 months should be an issue.

We increased the machine power but with no luck - from i3-large to i3-xlarge (AWS).

We are trying to figure out if it's an issue in InfluxDB or it can be some wrong configuration of our measurement.

Slow aggregation query and high cpu consumption in influxDB

Answers (1)

Related Questions