Reputation: 23
I have a data frame called "marketdata" which contains 3,000,000 rows (rownames: 1 to 3,000,000) and 2 columns (colnames: "mid", "bo").
> head(marketdata)
mid bo
1 250 0.05
2 251 0.07
3 252 0.13
4 249 0.08
5 250 0.12
My function is as follows:
movingWindow <- function (submarketdata) {
temp <- submarketdata[submarketdata$bo <= 0.1, ]
return( c(mean(temp$mid), NROW(temp)/100) )
}
result <- lapply(c(101:NROW(marketdata)), function(i) movingWindow( marketdata[ (i-99):i , ] ))
For example, for row 101, I will search through marketdata[2:101,]
. Then find out those rows that have "bo" value <= 0.1 as "effective sample". And finally calculate the average of these "effective samples" and the percentage of them.
However, this script runs really slow. It took about 15 minutes to finish all the 3,000,000 lines. Could any one help me to speed this up? Thank you.
Upvotes: 1
Views: 96
Reputation: 132706
set.seed(42)
marketdata <- data.frame(mid=runif(200, 245, 255),
bo=runif(200, 0, 0.2))
movingWindow <- function (submarketdata) {
temp <- submarketdata[submarketdata$bo <= 0.1, ]
return( c(mean(temp$mid), NROW(temp)/100) )
}
result <- t(sapply(c(101:NROW(marketdata)), function(i) movingWindow( marketdata[ (i-99):i , ] )))
#faster alternative:
library(zoo)
r1 <- rollmean(marketdata$bo <= 0.1, 100)
all.equal(r1[-1], result[,2])
r2 <- rollsum((marketdata$bo <= 0.1)*marketdata$mid, 100)/(100*r1)
result2 <- cbind(r2, r1)
#same result?
all.equal(result, unname(result2[-1,]))
#[1] TRUE
#base R alternative (assuming there are no NA values in your data)
r1a <- na.omit(filter(marketdata$bo <= 0.1, rep(0.01, 100)))
r2a <- na.omit(filter((marketdata$bo <= 0.1)*marketdata$mid, rep(1, 100)))/(100*r1a)
result2a <- cbind(r2a, r1a)
#same result?
all.equal(result, unname(result2a[-1,]))
#[1] TRUE
The alternatives give one value more (the first value). Otherwise the results are identical and both alternatives are much faster.
Benchmarks for the example:
Unit: microseconds
expr min lq median uq max neval
original 19006.144 19435.262 20824.245 21243.524 52965.168 100
alternative1 1444.574 1525.774 1607.264 1646.524 3486.940 100
alternative2 975.366 1006.913 1071.305 1106.437 3117.709 100
Upvotes: 3