Mike
Mike

Reputation: 1069

R: Aggregating data between two dates without a loop

I have two dataframes, one with trade data (which can occur in any interval of time) and another which breaks time up into hourly intervals. Both contain a column with the date and time. There can be several trades within an hour and I would like to aggregate all the trades in an hour time period in the trades dataframe into one row in the hourly dataframe.

For example:
Trade Table

       Time                  Amount  
2014-09-10 13:01:01            10  
2014-09-10 13:05:13            20
2014-09-10 13:59:40             1  
2014-09-10 14:30:45             2  
2014-09-10 14:50:00            30  

I would like to aggregate this in the hourly table as follows:

        Time         Amount
2014-09-10 14:00:00   31   (i.e. 10+20+1)
2014-09-10 15:00:00   32   (i.e. 2+30)

Above, all trade between 2014-09-10 13:00:00 and 2014-09-10 14:00:00 have been aggregated and placed under 2014-09-10 14:00:00 My current code looks as follows:

for (m in 2:nrow(hourlyData))
{
      hourlyData[m,2] = sum(as.numeric(tradeData[intersect(which(tradeData[,1] < hourlyData[m,1]),
        which(tradeData[,1] >= hourlyData[m-1,1])),2])) #Get P&L aggregated by date        
}  

Which works, but is very slow.

Does anyone have advice on how I could do this without a for loop?

Thanks

Mike

Upvotes: 1

Views: 888

Answers (2)

David Arenburg
David Arenburg

Reputation: 92300

Here's data.table approach

library(data.table)
setDT(df)[, list(Amount = sum(Amount)), 
            by = list(Time = as.POSIXct(Time, format = "%Y-%m-%d %H") + 3600)]
#                   Time Amount
# 1: 2014-09-10 14:00:00     31
# 2: 2014-09-10 15:00:00     32

Edit: This is how you would do it efficiently (though less) with base R

with(df, tapply(Amount, (as.POSIXct(Time, format="%Y-%m-%d %H") + 3600), FUN = sum))

Or (less efficient, but returns a data.frame)

with(df, aggregate(Amount, list((as.POSIXct(Time, format="%Y-%m-%d %H") + 3600)), FUN = sum))

Or

aggregate(df$Amount ~ with(df, as.POSIXct(Time, format="%Y-%m-%d %H") + 3600), FUN = sum)

Upvotes: 4

akrun
akrun

Reputation: 887851

Try

 library(dplyr)
 df %>%
      group_by(Time=as.POSIXct(Time, format="%Y-%m-%d %H")+3600) %>%
      summarise(Amount=sum(Amount))

gives the output

  #                 Time Amount
  #1 2014-09-10 14 00:00     31
  #2 2014-09-10 15 00:00     32

data

df <- structure(list(Time = c("2014-09-10 13:01:01", "2014-09-10 13:05:13", 
"2014-09-10 13:59:40", "2014-09-10 14:30:45", "2014-09-10 14:50:00"
), Amount = c(10L, 20L, 1L, 2L, 30L)), .Names = c("Time", "Amount"
), class = "data.frame", row.names = c(NA, -5L))

Upvotes: 3

Related Questions