Gobrel
Gobrel

Reputation: 351

How to correctly resample time series data?

I have time series data on a PostgreSQL database. Currently I import this into pandas to do some analysis and my first step often times is resampling the 5 min frequency data to 1 hour average data. I do this so that I first pivot the data to bring it in wide form, then I resample it to 1 hour and after that I melt it to have it in long form again.

Now I want to do the resampling on the Database so that I can import the 1 hour average right away.

This is what the data looks like on the Database. I have two different types with 3 different names.

    datetime                    value   type   name
0    2018-01-01 13:35:00+01:00   0.22    HLN    NO2
1    2018-01-01 13:35:00+01:00   0.31    HLN    CO
2    2018-01-01 13:35:00+01:00   1.15    HLN    NO
3    2018-01-01 13:40:00+01:00   1.80    AIS    NO2
4    2018-01-01 13:40:00+01:00   2.60    AIS    CO
5    2018-01-01 13:40:00+01:00   2.30    AIS    NO
6    2018-01-01 13:45:00+01:00   2.25    HLN    NO2
7    2018-01-01 13:45:00+01:00   2.14    HLN    CO
8    2018-01-01 13:45:00+01:00   2.96    HLN    NO
9    2018-01-01 14:35:00+01:00   0.76    HLN    NO2
10   2018-01-01 14:35:00+01:00   0.80    HLN    CO
11   2018-01-01 14:35:00+01:00   1.19    HLN    NO
12   2018-01-01 14:40:00+01:00   1.10    AIS    NO2
13   2018-01-01 14:40:00+01:00   2.87    AIS    CO
14   2018-01-01 14:40:00+01:00   2.80    AIS    NO
15   2018-01-01 14:45:00+01:00   3.06    HLN    NO2
16   2018-01-01 13:45:00+01:00   2.86    HLN    CO
17   2018-01-01 13:45:00+01:00   2.22    HLN    NO

Now comes the part which I have problems with. After resampling and plotting it in pandas and plotly I get the expected result which is right, one value for every hour:

enter image description here

After performing a SQL Query to resample it to one hour with the following code:

SELECT date_trunc('hour', datetime) AS hour, type, AVG(value) AS measure, name
                    FROM data_table
                    GROUP BY datetime, type, name
                    ORDER BY datetime

I get this after plotting:

enter image description here

This is not smooth and there are multiple values in the hour, I guess there are all the values within the hour.

My question, how can I correctly resample a time series in SQL?

Edit: Expected result in table form:

 datetime             value  type  name
2018-01-01 13:00:00   1.235   HLN   NO2
2018-01-01 13:00:00   2.65   HLN   CO
2018-01-01 13:00:00   2.96   HLN   NO
2018-01-01 13:00:00   2.48   AIS   NO2
2018-01-01 13:00:00   2.65   AIS   CO
2018-01-01 13:00:00   2.26   AIS   NO
2018-01-01 14:00:00   2.78   HLN   NO2
2018-01-01 14:00:00   3.65   HLN   CO
2018-01-01 14:00:00   1.95   HLN   NO
2018-01-01 14:00:00   1.45   AIS   NO2
2018-01-01 14:00:00   1.64   AIS   CO
2018-01-01 14:00:00   3.23   AIS   NO

Upvotes: 2

Views: 2646

Answers (1)

Jim Jones
Jim Jones

Reputation: 19643

An alternative is to create the time intervals using generate_series() or just with in a subquery / CTE truncate the hours per type and name, and in the outer query join both records and aggregate the values with avg() by the columns hour, type and name, e.g.

WITH j AS (
  SELECT DISTINCT date_trunc('hour', datetime) AS hour, type,name 
  FROM data_table
)
SELECT j.*, avg(d.value)
FROM data_table d
JOIN j ON date_trunc('hour', d.datetime) = j.hour AND 
          j.type = d.type AND 
          d.name = j.name
GROUP BY j.hour, j.name, j.type
ORDER BY j.hour ASC,j.type DESC;

returns

hour type name avg
2018-01-01 13:00:00 HLN CO 1.7700000000000000
2018-01-01 13:00:00 HLN NO 2.1100000000000000
2018-01-01 13:00:00 HLN NO2 1.23500000000000000000
2018-01-01 13:00:00 AIS CO 2.6000000000000000
2018-01-01 13:00:00 AIS NO 2.3000000000000000
2018-01-01 13:00:00 AIS NO2 1.80000000000000000000
2018-01-01 14:00:00 HLN CO 0.80000000000000000000
2018-01-01 14:00:00 HLN NO 1.19000000000000000000
2018-01-01 14:00:00 HLN NO2 1.9100000000000000
2018-01-01 14:00:00 AIS CO 2.8700000000000000
2018-01-01 14:00:00 AIS NO 2.8000000000000000
2018-01-01 14:00:00 AIS NO2 1.10000000000000000000

db<>fiddle demo

Upvotes: 1

Related Questions