Reputation: 496
Our sensors produce values in irregular time intervals:
12:00 10 12:02 20 12:22 30 12:29 40
I am trying to find a time series database that can automatically calculate the average value for certain regular time intervals (e.g. 10min). Of course the longer a value was valid within the interval the more weight it has in the average (time weighted average). (e.g. 12:00-12:10: (10*2+20*8)/10=18) )
I am searching now on the internet for hours and found lots of time series databases that talk about irregular time series (e.g. InfluxDB, OpenTDSB, etc.) and most of them have some SQL-like query language with aggregation functions.
Unfortunately they don't say how exactly irregular time intervals are averaged. As I don't want to try all of them, can someone tell me which databases support the calculation of a time-weighted-average? Thanks!
Upvotes: 4
Views: 4142
Reputation: 18084
The time-weighted average (TWA) can be calculated if TSDB supports the function for value integration over the given time range. Then the TWA can be calculated as integral over the given duration divided by the duration. For example, the following query calculates the time-weighted average for the metric power
over the last hour in VictoriaMetrics:
integrate(power[1h])/1h
See more details about the integrate()
function at MetricsQL docs.
Upvotes: 0
Reputation: 4559
Weighted time average aggregator (wtavg) is supported in Axibase Time Series Database: http://axibase.com/products/axibase-time-series-database/visualization/widgets/configuring-the-widgets/aggregators/
wtavg
weighs older samples at a linearly decreasing rate compared to current time.
This aggregator is supported in REST API, SQL layer, and the rule engine.
EDIT 2016-06-15T12:52:00Z: Supported interpolation functions:
Disclosure: I work for Axibase.
Upvotes: 1
Reputation: 21
I recently had to provide a solution for weighted average over irregular samples for our own SCADA/IoT product, with data stored in PostgreSQL. If you'd like to roll your own, here's how you can do it.
Let's assume the following table:
create table samples (
stamp timestamptz,
series integer,
value float
);
insert into samples values
('2018-04-30 23:00:00+02', 1, 12.3),
('2018-05-01 01:45:00+02', 1, 22.2),
('2018-05-01 02:13:00+02', 1, 21.6),
('2018-05-01 02:26:00+02', 1, 14.9),
('2018-05-01 03:02:00+02', 1, 16.9);
To calculate a regular, weighted average, we'll need to do the following:
Before presenting the code, we'll make the following assumptions:
Suppose we are interested in calculating hourly weighted average for the time period between 2018-05-01 00:00:00+02
and 2018-05-01 04:00:00+02
for series 1
. We'll start by querying for the given time range, adding an aligned stamp:
select
stamp,
to_timestamp(extract (epoch from stamp)::integer / 3600 * 3600)
as stamp_aligned,
value
from samples
where
series = 1 and
stamp >= '2018-05-01 00:00:00+02' and
stamp <= '2018-05-01 04:00:00+02';
Which gives us:
stamp | stamp_aligned | value
------------------------+------------------------+-------
2018-05-01 01:45:00+02 | 2018-05-01 01:00:00+02 | 22.2
2018-05-01 02:13:00+02 | 2018-05-01 02:00:00+02 | 21.6
2018-05-01 02:26:00+02 | 2018-05-01 02:00:00+02 | 14.9
2018-05-01 03:02:00+02 | 2018-05-01 03:00:00+02 | 16.9
(4 rows)
We will have noticed that:
00:00:00
, nor the value for 01:00:00
.stamp_aligned
column tell us to which time period the record belongs, but in fact the table is missing the value at the beginning of each period.To solve these problems, we'll query for the last known value before the given time range, and add records for round hours, which we'll later fill with correct values:
with
t_values as (
select * from (
-- select last value prior to time range
(select
stamp,
to_timestamp(extract(epoch from stamp)::integer / 3600 * 3600)
as stamp_aligned,
value,
false as filled_in
from samples
where
series = 1 and
stamp < '2018-05-01 00:00:00+02'
order by
stamp desc
limit 1) union
-- select records from given time range
(select
stamp,
to_timestamp(extract(epoch from stamp)::integer / 3600 * 3600)
as stamp_aligned,
value,
false as filled_in
from samples
where
series = 1 and
stamp >= '2018-05-01 00:00:00+02' and
stamp <= '2018-05-01 04:00:00+02'
order by
stamp) union
-- select all regular periods for given time range
(select
stamp,
stamp as stamp_aligned,
null as value,
true as filled_in
from generate_series(
'2018-05-01 00:00:00+02',
'2018-05-01 04:00:00+02',
interval '3600 seconds'
) stamp)
) states
order by stamp
)
select * from t_values;
Which gives us
stamp | stamp_aligned | value | filled_in
------------------------+------------------------+-------+-----------
2018-04-30 23:00:00+02 | 2018-04-30 23:00:00+02 | 12.3 | f
2018-05-01 00:00:00+02 | 2018-05-01 00:00:00+02 | ¤ | t
2018-05-01 01:00:00+02 | 2018-05-01 01:00:00+02 | ¤ | t
2018-05-01 01:45:00+02 | 2018-05-01 01:00:00+02 | 22.2 | f
2018-05-01 02:00:00+02 | 2018-05-01 02:00:00+02 | ¤ | t
2018-05-01 02:13:00+02 | 2018-05-01 02:00:00+02 | 21.6 | f
2018-05-01 02:26:00+02 | 2018-05-01 02:00:00+02 | 14.9 | f
2018-05-01 03:00:00+02 | 2018-05-01 03:00:00+02 | ¤ | t
2018-05-01 03:02:00+02 | 2018-05-01 03:00:00+02 | 16.9 | f
2018-05-01 04:00:00+02 | 2018-05-01 04:00:00+02 | ¤ | t
(10 rows)
So we have at least one record for each time period, but we still need to fill in values for the filled-in records:
with
t_values as (
...
),
-- since records generated using generate_series do not contain values,
-- we need to copy the value from the last non-generated record.
t_with_filled_in_values as (
-- the outer query serves to remove any record prior to the given
-- time range
select *
from (
select
stamp,
stamp_aligned,
-- fill in value from last non-filled record (the first record
-- having the same filled_in_partition value)
(case when filled_in then
first_value(value) over (partition by filled_in_partition
order by stamp) else value end) as value
from (
select
stamp,
stamp_aligned,
value,
filled_in,
-- this field is incremented on every non-filled record
sum(case when filled_in then 0 else 1 end)
over (order by stamp) as filled_in_partition
from
t_values
) t_filled_in_partition
) t_filled_in_values
-- we wrap the filling-in query in order to remove any record before the
-- beginning of the given time range
where stamp >= '2018-05-01 00:00:00+02'
order by stamp
)
select * from t_with_filled_in_values;
Which gives us the following:
stamp | stamp_aligned | value
------------------------+------------------------+-------
2018-05-01 00:00:00+02 | 2018-05-01 00:00:00+02 | 12.3
2018-05-01 01:00:00+02 | 2018-05-01 01:00:00+02 | 12.3
2018-05-01 01:45:00+02 | 2018-05-01 01:00:00+02 | 22.2
2018-05-01 02:00:00+02 | 2018-05-01 02:00:00+02 | 22.2
2018-05-01 02:13:00+02 | 2018-05-01 02:00:00+02 | 21.6
2018-05-01 02:26:00+02 | 2018-05-01 02:00:00+02 | 14.9
2018-05-01 03:00:00+02 | 2018-05-01 03:00:00+02 | 14.9
2018-05-01 03:02:00+02 | 2018-05-01 03:00:00+02 | 16.9
2018-05-01 04:00:00+02 | 2018-05-01 04:00:00+02 | 16.9
(9 rows)
So we're all good - we've added records with correct values for all round hours and we also removed the first record which gave us the value for the beginning of the time range, but was lying outside of it. No we are ready for the next step.
We'll continue by calculating the duration for each record:
with
t_values as (
...
),
t_with_filled_in_values (
...
),
t_with_weight as (
select
stamp,
stamp_aligned,
value,
-- use window to get stamp from next record in order to calculate
-- the duration of the record which, divided by the period, gives
-- us the weight.
coalesce(extract(epoch from (lead(stamp)
over (order by stamp) - stamp)), 3600)::float / 3600 as weight
from t_with_filled_in_values
order by stamp
)
select * from t_with_weight;
Which gives us:
stamp | stamp_aligned | value | weight
------------------------+------------------------+-------+--------------------
2018-05-01 00:00:00+02 | 2018-05-01 00:00:00+02 | 12.3 | 1
2018-05-01 01:00:00+02 | 2018-05-01 01:00:00+02 | 12.3 | 0.75
2018-05-01 01:45:00+02 | 2018-05-01 01:00:00+02 | 22.2 | 0.25
2018-05-01 02:00:00+02 | 2018-05-01 02:00:00+02 | 22.2 | 0.216666666666667
2018-05-01 02:13:00+02 | 2018-05-01 02:00:00+02 | 21.6 | 0.216666666666667
2018-05-01 02:26:00+02 | 2018-05-01 02:00:00+02 | 14.9 | 0.566666666666667
2018-05-01 03:00:00+02 | 2018-05-01 03:00:00+02 | 14.9 | 0.0333333333333333
2018-05-01 03:02:00+02 | 2018-05-01 03:00:00+02 | 16.9 | 0.966666666666667
2018-05-01 04:00:00+02 | 2018-05-01 04:00:00+02 | 16.9 | 1
(9 rows)
All that's left is to sum it up:
with
t_values as (
...
),
t_with_filled_in_values (
...
),
t_with_weight as (
...
)
select
stamp_aligned as stamp,
sum(value * weight) as avg
from t_with_weight
group by stamp_aligned
order by stamp_aligned;
The result:
stamp | avg
------------------------+------------------
2018-05-01 00:00:00+02 | 12.3
2018-05-01 01:00:00+02 | 14.775
2018-05-01 02:00:00+02 | 17.9333333333333
2018-05-01 03:00:00+02 | 16.8333333333333
2018-05-01 04:00:00+02 | 16.9
(5 rows)
You can find the complete code in this gist.
Upvotes: 2
Reputation: 964
OpenTSDB performs aggregation across all series in a query at the time(s) implied by the query. For any series not having a data value at a timestamp, it linearly interpolates a value from the values before and after. It does this "upsampling" at query time -- the original data is always stored as it was when it arrived. You can perform a trailing windowed time average, but not an exponentially weighted moving average (I believe that is what you meant by time-weighted?)
http://opentsdb.net/docs/build/html/user_guide/query/aggregators.html
(I should add, that's not a blanket recommendation for OpenTSDB as the db you should use, I'm just responding to your question)
Upvotes: 2