Reputation: 349
I have a timeseries with a 5 minute resolution and I would like to aggregate (average) all values within 30 minutes intervals for each category (X/Y/Z) variable.
I have found many threads here on SO but none of them aggregates half-hourly and I don't have an idea how to combine the half-hourly aggregation with a second column. Furthermore, I would appreciate it if there would be a data.table solution, because of the superior performance. The original tables are several million rows long with 1000s of categories.
My data looks like this:
+---------------------+------+------------+
| Timestamp | DUID | Meter_Prod |
+---------------------+------+------------+
| 2018-03-01 00:00:00 | X | 1 |
| 2018-03-01 00:00:00 | Y | 2 |
| 2018-03-01 00:00:00 | Z | 3 |
| 2018-03-01 00:05:00 | X | 1 |
| 2018-03-01 00:05:00 | Y | 2 |
| 2018-03-01 00:05:00 | Z | 3 |
| ... | | |
| 2018-03-01 00:55:00 | X | 1 |
| 2018-03-01 00:55:00 | Y | 2 |
| 2018-03-01 00:55:00 | Z | 3 |
+---------------------+------+------------+
I would like to have this
+---------------------+------+--------------------+
| Timestamp | DUID | Meter_Prod_Average |
+---------------------+------+--------------------+
| 2018-03-01 00:00:00 | X | 1 |
| 2018-03-01 00:00:00 | Y | 2 |
| 2018-03-01 00:00:00 | Z | 3 |
| 2018-03-01 00:30:00 | X | 1 |
| 2018-03-01 00:30:00 | Y | 2 |
| 2018-03-01 00:30:00 | Z | 3 |
+---------------------+------+--------------------+
An example data frame is uploaded here: https://pastebin.com/4bESGTKH
Upvotes: 0
Views: 291
Reputation: 2017
You could try a data.table
rolling join. Create a new data.table with the time intervals you want:
head(dt)
# Timestamp DUID Meter_Prod
# 1 2018-03-01 00:00:00 X 15
# 2 2018-03-01 00:00:00 Y 122
# 3 2018-03-01 00:00:00 Z 6
# 4 2018-03-01 00:05:00 X 15
# 5 2018-03-01 00:05:00 Y 122
# 6 2018-03-01 00:05:00 Z 6
start_time <- as.POSIXct('2018-03-01 00:00:00')
new_time <- data.table(Timestamp=seq.POSIXt(start_time,max(dt$Timestamp),by=30*60)) # Will be deleted during join
new_time[,ts:=Timestamp] # Will be preserved for grouping
The new table should have a dummy column with the same timestamps for the join.
new_dt <- new_time[dt,on='Timestamp',roll=+Inf] #Join statement
new_dt[,.(Meter_Prod_Average=mean(Meter_Prod)),by=.(ts,DUID)] # Aggregation
# ts DUID Meter_Prod_Average
# 1: 2018-03-01 00:00:00 X 15
# 2: 2018-03-01 00:00:00 Y 122
# 3: 2018-03-01 00:00:00 Z 6
# 4: 2018-03-01 00:30:00 X 15
# 5: 2018-03-01 00:30:00 Y 122
# ---
# 140: 2018-03-01 23:00:00 Y 122
# 141: 2018-03-01 23:00:00 Z 6
# 142: 2018-03-01 23:30:00 X 2696
# 143: 2018-03-01 23:30:00 Y 122
# 144: 2018-03-01 23:30:00 Z 6
Based on the roll
argument in the join you can vary the position of the timestamp wrt the interval.
Upvotes: 1