Blue_Togepi
Blue_Togepi

Reputation: 51

InfluxDB2.0: How to sum up multiple time series with irregular time interval?

TL;DR I'm using Influxdb v2.0 and use the Influx query syntax (as in the GUI). I'm having multiple series (same _field, different tag) of digital 0/1 state, and I want to sum them up. The problem is that the state is stored in the database with irregular time interval, meaning that for any time the real actual value for each tag should be queried with the last point possible. I have tried aggregateWindow with 'last' as the function but last just drop table for the windows with no point stored. Is there anyway I could sum them up? I accept any method (including exporting the data and use other language script instead lmao). Thank you in advance.


The Scenario

My team had implemented a check-in/check-out system with phone number representing each person for a real world event earlier, and had decided to use InfluxDB v2.0 as a database (We choose it so we can monitor through Grafana easily). I have a bucket storing points of checkin/checkout value, all the same schema. The schema is as followed:

measurement: 'user'
tags: [phone, type]     // type is either ['normal', 'staff']
value: 0 or 1           // 0 for checking out event, 1 for checking in event

Whenever someone checks in the event, a point of value 1 is inserted, and vice versa, a point of value 0 is inserted whenever someone checks out the event. Keep in mind that the point can duplicate if user decided to trigger the api again like having already checked in earlier and check in again (although, we view this as having the same state of 1). So the data is like a digital 0/1 state but with irregular time interval of points, one graph line for each phone number. Same phone numbers but with different types are seen as different person for us.

The project had already deployed and we are tasked to do post-processing with the data. The problem is to visualize a graph of in-event population for the entire time. In mathematic point of view, this should be easily solved by summing all the state of each person (The 0/1 line) over time. I first tried something like this in the Influx query:

from(bucket: "event_name")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "user")
  |> group(columns: ["type"])
  |> aggregateWindow(every: v.windowPeriod, fn: sum, createEmpty: true)
  |> yield()

The result looks very promising, a population graph with 2 colours of type normal and staff. But when I look carefully, the sum function of Influx actually sum the _value of each point in each window. Meaning that for some window with no point, the sum function does not actually sum up everyone in the database. The goal is to sum the actual _value for those window with no point (the _value of these window should be the same as the _value of the last point, ex. like I had checked in at 7.00pm and the _value should be 1 all the time after 7.00pm even some window does not have any point). I then tried something like this:

from(bucket: "event_name")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "user")
  |> aggregateWindow(every: 1m, fn: last, createEmpty: true)
  |> fill(usePrevious: true)
  |> group(columns: ["type"])
  |> aggregateWindow(every: 1m, fn: sum)
  |> yield()

I use last point for each window, then fill the window with empty _value with previous possible point, then sum up the _value of each window again. But then I found out that last function actually drop empty table, meaning that the window with no point is dropped (createEmpty is then useless). The problem then scope into that I must find function like last but without dropping empty table. I have tried reduce to create my own logic like last but sadly it didn't go like I want (might be that I coded it wrong).

If you have any idea, please help. Thank you very much.

Upvotes: 2

Views: 4521

Answers (1)

Blue_Togepi
Blue_Togepi

Reputation: 51

Nvm, I have found the solution, here is for those who are right in the same situation, though not very elegant in performance but it's the only query I have found it works.

from(bucket: "event_name")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "user")
  |> aggregateWindow(every: 1m, fn: last, createEmpty: false)
  |> aggregateWindow(every: 1m, fn: mean, createEmpty: true)
  |> fill(usePrevious: true)
  |> fill(value: 0.0)
  |> group(columns: ["type"])
  |> aggregateWindow(every: 1m, fn: sum, createEmpty: false)
  |> yield(name: "population")
  1. I use last first to have the latest state for each window (though last actually drop empty tables, so making createEmpty: true is useless anyways)
  2. Then for windows which does not have any point, I use mean with createEmpty: true in order to create points with null _value for empty windows. For windows that actually have real points, mean shouldn't change the value as there should only be 1 point per window because we used last earlier. The point of using mean here is just to create null points for empty windows. The step here is just to find a not-doing-anything function that doesn't drop empty table created by createEmpty. Fyi, I have tried many functions including making my own custom function like reduce and map but they do drop empty tables (and assigning null isn't even allowed), I even create an empty function like fn: (tables=<-, x) => tables for aggregateWindow but it drop empty tables anyways. So mean is my best bet here, though the side effect is my values changes from int to float.
  3. I use fill here to replace null points with value from the last window. This is why I'm trying to assign null to the point in empty windows from the last step, and mean can only does this. The second fill is for those early empty window which should represent 0-state.
  4. Then group by the type and summing them up should be the result I looking for

Hope I could help anyone who are in the same situation like me in the future

Upvotes: 3

Related Questions