eselk
eselk

Reputation: 6894

InfluxDB average of distinct count over time

Using Influx DB v0.9, say I have this simple query:

select count(distinct("id")) FROM "main" WHERE time > now() - 30m and time < now() GROUP BY time(1m)

Which gives results like:

08:00 5
08:01 10
08:02 5
08:03 10
08:04 5

Now I want a query that produces points with an average of those values over 5 minutes. So the points are now 5 minutes apart, instead of 1 minute, but are an average of the 1 minute values. So the above 5 points would be 1 point with a value of the result of (5+10+5+10+5)/5.

This does not produce the results I am after, for clarity, since this is just a count, and I'm after the average.

select count(distinct("id")) FROM "main" WHERE time > now() - 30m and time < now() GROUP BY time(5m)

This doesn't work (gives errors):

select mean(distinct("id")) FROM "main" WHERE time > now() - 30m and time < now() GROUP BY time(5m)

Also doesn't work (gives error):

select mean(count(distinct("id"))) FROM "main" WHERE time > now() - 30m and time < now() GROUP BY time(5m)

In my actual usage "id" is a string (content, not a tag, because count distinct not supported for tags in my version of InfluxDB).

Upvotes: 2

Views: 5930

Answers (1)

beckettsean
beckettsean

Reputation: 1846

To clarify a few points for readers, in InfluxQL, functions like COUNT() and DISTINCT() can only accept fields, not tags. In addition, while COUNT() supports the nesting of the DISTINCT() function, most nested or sub-functions are not yet supported. In addition, nested queries, subqueries, or stored procedures are not supported.

However, there is a way to address your need using continuous queries, which are a way to automate the processing of data and writing those results back to the database.

First take your original query and make it a continuous query (CQ).

CREATE CONTINUOUS QUERY count_foo ON my_database_name BEGIN SELECT COUNT(DISTINCT("id")) AS "1m_count" INTO main_1m_count FROM "main" GROUP BY time(1m) END

There are other options for the CQ, but that basic one will wake up every minute, calculate the COUNT(DISTINCT("id")) for the prior minute, and then store that result in a new measurement, main_1m_count.

Now, you can easily calculate your 5 minute mean COUNT from the pre-calculated 1 minute COUNT results in main_1m_count:

SELECT MEAN("1m_count") FROM main_1m_count WHERE time > now() - 30m GROUP BY time(5m)

(Note that by default, InfluxDB uses epoch 0 and now() as the lower and upper time range boundaries, so it is redundant to include and time < now() in the WHERE clause.)

Upvotes: 5

Related Questions