Reputation: 512
We have a very annoying issue with InfluxDB (no clustering). Our InfluxDB version is 1.7.
As part of our business requirements, we are storing points (or data) of events. And from time to time we have peaks (~20 millions events in a day).
We have 2 keys of tags - tag1, tag2 and key value - value1.
Our cardinality after running:
SHOW SERIES CARDINALITY ON db1
is 24.
And specifically for this measurement 'measurement1' with the 20 million points we have only 1 series (we have only one pair of values of keys tag1 and tag2):
measurement1,tag1=tag1value,tag2=tag2value
Now, as I said if for a selected period we can have up to 20 millions points.
SELECT SUM(someDoubleValue) AS result
FROM measurement1
WHERE time > '2019-04-15T21:00:00Z'
AND time < '2019-05-17T20:59:59.999Z'
AND (tag1 = '1234567')
GROUP BY time(30d, 21h) FILL(0);
This query is returned after 6-7 seconds but it consumes 80-100% of our cpu.
Now the cardinality is low (perhaps I may be wrong - is there another way to validate that our cardinality is low?).
Another issue that we want to fix is that in some of our flows, we can have 2-3 of these queries to be called on parallel. This of course results in a failure (timeout of our client) for all of them.
We tried to figure out if it was a sharding issue - we are using a default shard of 1 week so we don't think that querying 2 months should be an issue.
We increased the machine power but with no luck - from i3-large to i3-xlarge (AWS).
We are trying to figure out if it's an issue in InfluxDB or it can be some wrong configuration of our measurement.
Upvotes: 1
Views: 1378
Reputation: 22562
Are you using the tsm
index? Our tsi
index offers performance improvements, that may help.
https://docs.influxdata.com/influxdb/v1.7/concepts/tsi-details/
Upvotes: 0