David Shard
David Shard

Reputation: 101

Working with sparse timeseries data in InfluxDB

We store data of our IIOT devices in InfluxDB.

Our IIOT devices are only sending the data that changed. So unchanged data are "not" repeated most of the time.

Example:

time                 malfunction_status service_status warning_status
----                 ------------------ -------------- --------------
2020-02-27T14:55:38Z 0                  0              0
2020-02-27T14:55:39Z 0                  1              0
2020-02-27T14:55:57Z 1                  0              1
2020-02-27T14:56:08Z 1                  1              1
2020-02-27T14:56:24Z 1                  1              1
2020-02-27T14:57:38Z                    0
2020-02-27T14:57:50Z 0                                 0
2020-02-27T14:57:56Z                    1              1
2020-02-27T14:58:19Z 1                  0              0
2020-02-27T15:01:16Z 0                  1
2020-02-27T15:01:25Z                                   1
2020-02-27T15:01:54Z 1                  0              0
2020-02-27T15:02:34Z 0                                 1
2020-02-27T15:02:52Z 1                  1              0
2020-02-27T15:02:53Z 0                  0              1
2020-03-02T15:33:37Z 0                  0              1
2020-03-02T15:33:57Z                    1
2020-03-02T15:33:58Z 1
2020-03-02T15:35:21Z 0                  0              0
2020-03-02T15:35:38Z                    1              1
2020-03-02T15:35:49Z 1                                 0
2020-03-02T15:35:53Z                    0              1
2020-03-02T15:36:16Z                    1              0
2020-03-03T08:52:40Z 0                  0
2020-03-03T08:53:24Z 1
2020-03-03T08:53:33Z                    1              1
2020-03-03T08:55:16Z                    0              0
2020-03-03T08:55:29Z 0                                 1
2020-03-03T08:55:57Z 1                                 0
2020-03-03T08:56:11Z                    1              1
2020-03-03T08:58:04Z 0
2020-03-03T08:58:35Z                                   0
2020-03-03T09:02:37Z 1                  0              1
2020-03-03T09:02:52Z 0                  1              0
2020-03-03T09:03:02Z 1                  0              1
2020-03-03T09:03:06Z 0                                 0
2020-03-03T09:03:37Z 1                  1              1
2020-03-03T09:04:42Z                                   0
2020-03-03T09:04:45Z 0                  0
2020-03-03T09:05:04Z 1
2020-03-03T09:05:15Z                    1              1
2020-03-03T09:06:26Z 0                  0              0

It seems to be not possible to find all occurrences of each _status being '0' with a simple query.

select /_status/ from iot_data where mId='sim_az1_0' and malfunction_status=0 and service_status=0 and warning_status=0

yields only the rows that have concrete values of '0' for each status.

time                 malfunction_status service_status warning_status
----                 ------------------ -------------- --------------
2020-02-27T14:55:38Z 0                  0              0
2020-03-02T15:35:21Z 0                  0              0
2020-03-03T09:06:26Z 0                  0              0

At least the result:

2020-03-03T09:04:42Z                                   0
2020-03-03T09:04:45Z 0                  0

for 2020-03-03T09:04:45Z is missing.

It would only be the last option to fill up all column values while inserting the data to Influx as we record hundreds of columns.

My solution would to be currently to fetch the fields from Influx and evaluate them outside of the select.

Are the elegant solution with influx queries?

Thanks!

Upvotes: 1

Views: 531

Answers (1)

Nathan Pegram
Nathan Pegram

Reputation: 69

The delta based encoding that you use is an efficient way to send data that changes very little from point to point, but it raises the question of what timestamps you are actually looking for. The data encoding implies that there are not just time stamps, but time spans where all three statuses are 0. You can use a subquery with time grouping and fill function to create a series with uniform time spacing and no nulls. If you use a time period that is smaller than your expected interval, you shouldn't lose any of your target values:

select ms, ss, ws from 
(select min(malfunction_status) as ms, min(service_status) as ss, min(warning_status) as ws from iot_data where mId='sim_az1_0' group by time(1s) fill(previous))
where ms=0 and ss=0 and ms=0

Of course, that's going to give you a lot of 1 second timestamps in a row for each "occurrence" of all zero statuses.

https://docs.influxdata.com/influxdb/v1.7/query_language/data_exploration/#group-by-time-intervals-and-fill https://docs.influxdata.com/influxdb/v1.7/query_language/functions/#difference

Upvotes: 0

Related Questions