Reputation: 3753
I have the following data in HISTORY table with column names as:
ID, START_TIME, END_TIME, VALUE
51,2015-04-17 01:00:00,2015-04-17 01:10:00,98
51,2015-04-17 01:10:00,2015-04-17 01:20:00,96
51,2015-04-17 01:20:00,2015-04-17 01:30:00,97
51,2015-04-17 01:30:00,2015-04-17 01:40:00,99
51,2015-04-17 01:40:00,2015-04-17 01:50:00,98
51,2015-04-17 01:50:00,2015-04-17 02:00:00,105
51,2015-04-17 02:00:00,2015-04-17 02:10:00,103
51,2015-04-17 02:10:00,2015-04-17 02:20:00,101
51,2015-04-17 02:20:00,2015-04-17 02:30:00,100
51,2015-04-17 02:30:00,2015-04-17 02:40:00,104
51,2015-04-17 02:40:00,2015-04-17 02:50:00,102
51,2015-04-17 02:50:00,2015-04-17 03:00:00,98
51,2015-04-17 03:00:00,2015-04-17 03:10:00,97
51,2015-04-17 03:10:00,2015-04-17 03:20:00,96
51,2015-04-17 03:20:00,2015-04-17 03:30:00,99
51,2015-04-17 03:30:00,2015-04-17 03:40:00,100
51,2015-04-17 03:40:00,2015-04-17 03:50:00,101
51,2015-04-17 03:50:00,2015-04-17 04:00:00,102
51,2015-04-17 04:00:00,2015-04-17 04:10:00,99
51,2015-04-17 04:10:00,2015-04-17 04:20:00,104
51,2015-04-17 04:20:00,2015-04-17 04:30:00,105
51,2015-04-17 04:30:00,2015-04-17 04:40:00,103
51,2015-04-17 04:40:00,2015-04-17 04:50:00,98
51,2015-04-17 04:50:00,2015-04-17 05:00:00,97
51,2015-04-17 05:00:00,2015-04-17 05:10:00,101
51,2015-04-17 05:10:00,2015-04-17 05:20:00,103
51,2015-04-17 05:20:00,2015-04-17 05:30:00,101
51,2015-04-17 05:30:00,2015-04-17 05:40:00,105
51,2015-04-17 05:40:00,2015-04-17 05:50:00,102
51,2015-04-17 05:50:00,2015-04-17 06:00:00,98
I want to apply the function like max() to the VALUE column but with some frequency. If frequency is suppose 1hour then max function will be applied for 5 different sets for this data.
Ex. From start time 2015-04-17 01:00:00 to 2015-04-17 02:00:00 and so on. How to achieve this in r. Final output would be like this:
51, 2015-04-17 02:00:00, 105
51, 2015-04-17 03:00:00, 102
51, 2015-04-17 04:00:00, 104
51, 2015-04-17 05:00:00, 105
51, 2015-04-17 06:00:00, 105
where above columns are ID, START_TIME upto which max() is calculated, value which is the result of max() function for that hour. How to make this possible in r. Using intervals or something else?
Thanks..
Upvotes: 1
Views: 206
Reputation: 92282
Here's another way using data.table
library(data.table)
setDT(df)[, .(MAX_VALUE = max(VALUE)),
by = .(ID, START_TIME = as.POSIXct(START_TIME, format = "%F %H") + 3600)]
# ID START_TIME MAX_VALUE
# 1: 51 2015-04-17 02:00:00 105
# 2: 51 2015-04-17 03:00:00 104
# 3: 51 2015-04-17 04:00:00 102
# 4: 51 2015-04-17 05:00:00 105
# 5: 51 2015-04-17 06:00:00 105
Or a similar solution without any packages dependencies
df$START_TIME2 <- as.POSIXct(df$START_TIME, format = "%F %H") + 3600
aggregate(VALUE ~ ID + START_TIME2, df, max)
# ID START_TIME2 VALUE
# 1 51 2015-04-17 02:00:00 105
# 2 51 2015-04-17 03:00:00 104
# 3 51 2015-04-17 04:00:00 102
# 4 51 2015-04-17 05:00:00 105
# 5 51 2015-04-17 06:00:00 105
Upvotes: 4
Reputation: 28309
Here's a possible solution using data.table
library(data.table)
setDT(df)[, max(VALUE), by = .(START_TIME = sub(":.*", "", START_TIME))]
START_TIME V1
1: 2015-04-17 01 105
2: 2015-04-17 02 104
3: 2015-04-17 03 102
4: 2015-04-17 04 105
5: 2015-04-17 05 105
Upvotes: 1
Reputation: 54237
You could try
library(dplyr)
HISTORY %>%
group_by(ID, TIME = format(START_TIME + 60*60, "%Y-%m-%d %H:00:00")) %>%
summarise(MAX_VALUE = max(VALUE))
# ID TIME MAX_VALUE
# 1 51 2015-04-17 02:00:00 105
# 2 51 2015-04-17 03:00:00 104
# 3 51 2015-04-17 04:00:00 102
# 4 51 2015-04-17 05:00:00 105
# 5 51 2015-04-17 06:00:00 105
Upvotes: 3