Reputation: 1069
I have two dataframes, one with trade data (which can occur in any interval of time) and another which breaks time up into hourly intervals. Both contain a column with the date and time. There can be several trades within an hour and I would like to aggregate all the trades in an hour time period in the trades dataframe into one row in the hourly dataframe.
For example:
Trade Table
Time Amount
2014-09-10 13:01:01 10
2014-09-10 13:05:13 20
2014-09-10 13:59:40 1
2014-09-10 14:30:45 2
2014-09-10 14:50:00 30
I would like to aggregate this in the hourly table as follows:
Time Amount
2014-09-10 14:00:00 31 (i.e. 10+20+1)
2014-09-10 15:00:00 32 (i.e. 2+30)
Above, all trade between 2014-09-10 13:00:00
and 2014-09-10 14:00:00
have been aggregated and placed under 2014-09-10 14:00:00
My current code looks as follows:
for (m in 2:nrow(hourlyData))
{
hourlyData[m,2] = sum(as.numeric(tradeData[intersect(which(tradeData[,1] < hourlyData[m,1]),
which(tradeData[,1] >= hourlyData[m-1,1])),2])) #Get P&L aggregated by date
}
Which works, but is very slow.
Does anyone have advice on how I could do this without a for loop?
Thanks
Mike
Upvotes: 1
Views: 888
Reputation: 92300
Here's data.table
approach
library(data.table)
setDT(df)[, list(Amount = sum(Amount)),
by = list(Time = as.POSIXct(Time, format = "%Y-%m-%d %H") + 3600)]
# Time Amount
# 1: 2014-09-10 14:00:00 31
# 2: 2014-09-10 15:00:00 32
Edit: This is how you would do it efficiently (though less) with base R
with(df, tapply(Amount, (as.POSIXct(Time, format="%Y-%m-%d %H") + 3600), FUN = sum))
Or (less efficient, but returns a data.frame)
with(df, aggregate(Amount, list((as.POSIXct(Time, format="%Y-%m-%d %H") + 3600)), FUN = sum))
Or
aggregate(df$Amount ~ with(df, as.POSIXct(Time, format="%Y-%m-%d %H") + 3600), FUN = sum)
Upvotes: 4
Reputation: 887851
Try
library(dplyr)
df %>%
group_by(Time=as.POSIXct(Time, format="%Y-%m-%d %H")+3600) %>%
summarise(Amount=sum(Amount))
gives the output
# Time Amount
#1 2014-09-10 14 00:00 31
#2 2014-09-10 15 00:00 32
df <- structure(list(Time = c("2014-09-10 13:01:01", "2014-09-10 13:05:13",
"2014-09-10 13:59:40", "2014-09-10 14:30:45", "2014-09-10 14:50:00"
), Amount = c(10L, 20L, 1L, 2L, 30L)), .Names = c("Time", "Amount"
), class = "data.frame", row.names = c(NA, -5L))
Upvotes: 3