Pratik Patil
Pratik Patil

Reputation: 3753

Apply functions to hourly data in R

I have the following data in HISTORY table with column names as:

ID, START_TIME, END_TIME, VALUE

51,2015-04-17 01:00:00,2015-04-17 01:10:00,98
51,2015-04-17 01:10:00,2015-04-17 01:20:00,96
51,2015-04-17 01:20:00,2015-04-17 01:30:00,97
51,2015-04-17 01:30:00,2015-04-17 01:40:00,99
51,2015-04-17 01:40:00,2015-04-17 01:50:00,98
51,2015-04-17 01:50:00,2015-04-17 02:00:00,105
51,2015-04-17 02:00:00,2015-04-17 02:10:00,103
51,2015-04-17 02:10:00,2015-04-17 02:20:00,101
51,2015-04-17 02:20:00,2015-04-17 02:30:00,100
51,2015-04-17 02:30:00,2015-04-17 02:40:00,104
51,2015-04-17 02:40:00,2015-04-17 02:50:00,102
51,2015-04-17 02:50:00,2015-04-17 03:00:00,98
51,2015-04-17 03:00:00,2015-04-17 03:10:00,97
51,2015-04-17 03:10:00,2015-04-17 03:20:00,96
51,2015-04-17 03:20:00,2015-04-17 03:30:00,99
51,2015-04-17 03:30:00,2015-04-17 03:40:00,100
51,2015-04-17 03:40:00,2015-04-17 03:50:00,101
51,2015-04-17 03:50:00,2015-04-17 04:00:00,102
51,2015-04-17 04:00:00,2015-04-17 04:10:00,99
51,2015-04-17 04:10:00,2015-04-17 04:20:00,104
51,2015-04-17 04:20:00,2015-04-17 04:30:00,105
51,2015-04-17 04:30:00,2015-04-17 04:40:00,103
51,2015-04-17 04:40:00,2015-04-17 04:50:00,98
51,2015-04-17 04:50:00,2015-04-17 05:00:00,97
51,2015-04-17 05:00:00,2015-04-17 05:10:00,101
51,2015-04-17 05:10:00,2015-04-17 05:20:00,103
51,2015-04-17 05:20:00,2015-04-17 05:30:00,101
51,2015-04-17 05:30:00,2015-04-17 05:40:00,105
51,2015-04-17 05:40:00,2015-04-17 05:50:00,102
51,2015-04-17 05:50:00,2015-04-17 06:00:00,98

I want to apply the function like max() to the VALUE column but with some frequency. If frequency is suppose 1hour then max function will be applied for 5 different sets for this data.

Ex. From start time 2015-04-17 01:00:00 to 2015-04-17 02:00:00 and so on. How to achieve this in r. Final output would be like this:

51, 2015-04-17 02:00:00, 105
51, 2015-04-17 03:00:00, 102
51, 2015-04-17 04:00:00, 104
51, 2015-04-17 05:00:00, 105
51, 2015-04-17 06:00:00, 105

where above columns are ID, START_TIME upto which max() is calculated, value which is the result of max() function for that hour. How to make this possible in r. Using intervals or something else?

Thanks..

Upvotes: 1

Views: 206

Answers (3)

David Arenburg
David Arenburg

Reputation: 92282

Here's another way using data.table

library(data.table)
setDT(df)[, .(MAX_VALUE = max(VALUE)), 
             by = .(ID, START_TIME = as.POSIXct(START_TIME, format = "%F %H") + 3600)]
#    ID          START_TIME MAX_VALUE
# 1: 51 2015-04-17 02:00:00       105
# 2: 51 2015-04-17 03:00:00       104
# 3: 51 2015-04-17 04:00:00       102
# 4: 51 2015-04-17 05:00:00       105
# 5: 51 2015-04-17 06:00:00       105

Or a similar solution without any packages dependencies

df$START_TIME2 <-  as.POSIXct(df$START_TIME, format = "%F %H") + 3600
aggregate(VALUE ~ ID + START_TIME2, df, max)
#   ID         START_TIME2 VALUE
# 1 51 2015-04-17 02:00:00   105
# 2 51 2015-04-17 03:00:00   104
# 3 51 2015-04-17 04:00:00   102
# 4 51 2015-04-17 05:00:00   105
# 5 51 2015-04-17 06:00:00   105

Upvotes: 4

pogibas
pogibas

Reputation: 28309

Here's a possible solution using data.table

library(data.table)
setDT(df)[, max(VALUE), by = .(START_TIME = sub(":.*", "", START_TIME))]
     START_TIME  V1
1: 2015-04-17 01 105
2: 2015-04-17 02 104
3: 2015-04-17 03 102
4: 2015-04-17 04 105
5: 2015-04-17 05 105

Upvotes: 1

lukeA
lukeA

Reputation: 54237

You could try

library(dplyr)
HISTORY %>% 
  group_by(ID, TIME = format(START_TIME + 60*60, "%Y-%m-%d %H:00:00")) %>% 
  summarise(MAX_VALUE = max(VALUE))
#   ID                TIME MAX_VALUE
# 1 51 2015-04-17 02:00:00       105
# 2 51 2015-04-17 03:00:00       104
# 3 51 2015-04-17 04:00:00       102
# 4 51 2015-04-17 05:00:00       105
# 5 51 2015-04-17 06:00:00       105

Upvotes: 3

Related Questions