Reputation: 4495
I'm trying to write aggregated result from two measurements into a single measurement.
I found on documentation that you can write multiple matching measurements with :MEASUREMENT
keyword in INTO query. Like
SELECT * INTO "copy_NOAA_water_database"."autogen".:MEASUREMENT FROM
"NOAA_water_database"."autogen"./.*/
What I'm trying to do is aggregate from multiple measurements and write result to a single measurement.
SELECT mean("water_level") INTO
"copy_NOAA_water_database"."autogen"."water_agg" FROM
"NOAA_water_database"."autogen"./.*/ GROUP BY time(15m), *
The above query runs successfully, but I'm not sure whether influx has considered points from all measurement of NOAA_water_database or just last appearing measurement is considered.
Upvotes: 1
Views: 1543
Reputation: 19228
Q: I'm not sure whether influx has considered points from all measurement of NOAA_water_database or just last appearing measurement is considered.
A: I suspect influxdb
is not aggregating the data from your measurements.
I think it is only aggregating the data from each measurement individually and then for each output write it to your specified measurement and since the resolved time
of the mean
operation can possibly be the same, measurement B's result can overwrite measurement A's result.
I derived this theory by doing an experiment using the following dataset;
INSERT cpu,host=serverA value=10
INSERT cpu,host=serverA value=20
INSERT cpu2,host=serverA value=5
INSERT cpu2,host=serverA value=15
Doing a SELECT
statement similar to your query above returns;
select * FROM "historian"."autogen"./cpu.*/
name: cpu
time host value
---- ---- -----
1546511130857357196 serverA 10
1546511132744883738 serverA 20
name: cpu2
time host value
---- ---- -----
1546511156629403118 serverA 5
1546511157888695746 serverA 15
Then instead of using mean
I do sum
to find test the behaviour of influxdb.
I also simplified the query by dropping the groupBy
operation.
Doing a sum
gives me;
SELECT sum("value") INTO test_sum FROM "historian"."autogen"./.*/
name: result
time written
---- -------
0 2
> select * from test_sum;
name: test_sum
time sum
---- ---
0 20
Theory: if influx
is aggregating the data from all measurements, the sum result would not be 20
. It should be 50
. The only way 20 can be derived is from by summing 5 + 15
which is the data from the last measurement.
But when we do the sum
operation, influx
did told us 2 rows were written. My theory to this is that, the influx did calculate the sum of the first measurement however as first and second summation's result time is both 0
therefore the 2nd measurement's result would have overwritten the first result's.
Recommended solution: The best tool to do this job is actually influxdb's kapacitor. It is a great tool because it is fast however it is also extremely to learn.
Alternatively if your dataset isn't huge which I suspect it should be alright since you are grouping
by 15m. You can write a script in your favourite programming language to read out the data, do the mean
and then write the data back to influxdb
.
Upvotes: 1