Cary Li
Cary Li

Reputation: 261

InfluxDB performance

For my case, I need to capture 15 performance metrics for devices and save it to InfluxDB. Each device has a unique device id.

Metrics are written into InfluxDB in the following way. Here I only show one as an example

new Serie.Builder("perfmetric1")
    .columns("time", "value", "id", "type")
    .values(getTime(), getPerf1(), getId(), getType())
    .build()

Writing data is fast and easy. But I saw bad performance when I run query. I'm trying to get all 15 metric values for the last one hour.

select value from perfmetric1, perfmetric2, ..., permetric15
where id='testdeviceid' and time > now() - 1h

For an hour, each metric has 120 data points, in total it's 1800 data points. The query takes about 5 seconds on a c4.4xlarge EC2 instance when it's idle.

I believe InfluxDB can do better. Is this a problem of my schema design, or is it something else? Would splitting the query into 15 parallel calls go faster?

Upvotes: 5

Views: 3757

Answers (2)

dukebody
dukebody

Reputation: 7185

As @valentin answer says, you need to build an index for the id column for InfluxDB to perform these queries efficiently.

In 0.8 stable you can do this "indexing" using continuous fanout queries. For example, the following continuous query will expand your perfmetric1 series into multiple series of the form perfmetric1.id:

select * from perfmetric1 into perfmetric1.[id];

Later you would do:

select value from perfmetric1.testdeviceid, perfmetric2.testdeviceid, ..., permetric15.testdeviceid where time > now() - 1h

This query will take much less time to complete since InfluxDB won't have to perform a full scan of the timeseries to get the points for each testdeviceid.

Upvotes: 4

valentin
valentin

Reputation: 3608

Build an index on id column. Seems that he engine uses full scan on table to retrieve data. By splitting your query in 15 threads, the engine will use 15 full scans and the performance will be much worse.

Upvotes: 2

Related Questions