boniface316
boniface316

Reputation: 499

R - rollapply with multiple "by" values

I have trying to find an efficient way to do the code below:

library(zoo)
MaPrice <- function(x,N) {
    Mavg <- rollapply(x, N, mean)
    colnames(Mavg) <- "MaPrice"
    Mavg
}

Price.MA.1Hr <- MaPrice(out, 12)
Price.MA.2Hr <- MaPrice(out, 24)
Price.MA.4Hr <- MaPrice(out, 48)
Price.MA.6Hr <- MaPrice(out, 72)

The solution I came up with is the following:

MaPrice <- function(x,N) {
    MA <- matrix( ,nrow = nrow(x), ncol = length(N))
    for (i in 1:length(N)) {
        MA[,i]<- rollapply(x, N[i], mean)
    }
    MA
}

N <- c(1,2,4,6,8,12)

Price.MA <- MaPrice(Price, N)

Price is a vector (ncol = 1)

This still provides the answer I am looking for, but I am looking to see if there is an alternate maybe an efficient way. Any help is greatly appreciated.

Note: Already looked at the question "rollapply multiple time with diffrent arguments" on SO. Didnt understand the process.

Upvotes: 0

Views: 863

Answers (4)

CPak
CPak

Reputation: 13591

Reproducible data

N <- c(1,2,4,6,8,12)
set.seed(1)
Price <- data.frame(x=runif(20)*10)

#            x
# 1  2.6550866
# 2  3.7212390
# 3  5.7285336
# 4  9.0820779
# etc

NOTE Price can also be a vector and the solution works

Solution

Rewrite your function to return a data.frame with the N value used

MaPrice <- function(x,N) {
               Mavg <- data.frame(N = N, avg = rollapply(x, N, mean))
               Mavg
            }

You can use purrr::map_df to iterate through N

library(purrr)
Price.MA <- map_df(N, ~MaPrice(Price,.x))

Output

    N         x
1   1 2.6550866
2   1 3.7212390
3   1 5.7285336
4   1 9.0820779
5   1 2.0168193
# etc

Comparing Solutions

Since you might be interested in performance

Make Price a vector of 25,000 elements

N <- c(1,2,4,6,8,12)
set.seed(1)
Price <- runif(25000)*10

# parallel solution
library(parallel)
library(zoo)
PoGibas <- function(Price, N) {
               res <- mclapply(N, function(i) 
                         data.frame(i, rollapply(Price, i, mean)))
               # Final result
               do.call("rbind", res)
           }

# map_df solution
library(purrr)
MaPrice <- function(x,N) {
               Mavg <- data.frame(N = N, avg = rollapply(x, N, mean))
               Mavg
            }

CP <- function(Price, N) {
           Price.MA <- map_df(N, ~MaPrice(Price,.x))
       }

# mapply solution
out <- tbl_df(Price)
CArendt <- function() {
                mapply(function(x, n) {
                     rollapply(x, n, mean, fill = NA, align = "right")
                }, list(out), list(1, 2, 4, 6, 8, 12)) %>% tbl_df()
           }

# lapply zoo solution
library(zoo)
library(dplyr)
GG <- function(v, w) {
         z <- zoo(v)
         zz <- do.call("merge", lapply(setNames(w, w), rollmeanr, x = z))
}

Using microbechmark to compare solutions

library(microbenchmark)
microbenchmark(CP(Price,N), PoGibas(Price,N), CArendt())


              expr       min        lq      mean    median        uq       max
      CP(Price, N)  298.7038  308.9860  345.8867  334.0053  377.5278  468.1461
 PoGibas(Price, N)  306.3882  319.5721  358.8717  372.9655  388.6214  488.5565
         CArendt() 2589.2316 2647.2216 2762.0759 2682.7357 2733.5398 8746.8235
      GG(Price, N)  785.3042  853.5904  876.4554  869.0996  895.1906 1010.1746
 neval
   100
   100
   100
   100

The mean time of solutions is 353, 371, 876, and >2,000 ms

Upvotes: 2

G. Grothendieck
G. Grothendieck

Reputation: 270448

Assuming that the input vector is v this gives a zoo object zz whose ith column was formed using w[i]. as.data.frame(zz) or coredata(zz) could be used to produce a data.frame or matrix respectively if needed. setNames(w, w) could be reduced to just w if column names are not important.

# inputs
v <- 1:100  # data
w <- c(12, 24, 48, 72)

z <- zoo(v)
zz <- do.call("merge", lapply(setNames(w, w), rollmeanr, x = z))

or if a list of vectors is sufficient then:

lapply(setNames(w, w), rollmean, x = v)

Upvotes: 3

cole
cole

Reputation: 1757

So in the future, providing example data can make things easier for those who are trying to help. Also, it can speed things up to include packages and library statements and such so that they can run your code verbatim. (See the reprex package for useful tools in asking a good question).

I am a fan of using the apply family and list-based processing, so I would lean towards the following, along with dplyr. Getting used to the apply family can make this sort of task straightforward. Essentially, mapply loops over lists, applying the ith element to the ith call of the function (and recycling where needed).

library(zoo)
library(dplyr)


out <- tbl_df(randu[, 1])

## example with one
out %>% mutate(test = rollapply(., 12, mean, fill = NA))
#> # A tibble: 400 x 2
#>       value      test
#>       <dbl>     <dbl>
#>  1 0.000031        NA
#>  2 0.044495        NA
#>  3 0.822440        NA
#>  4 0.322291        NA
#>  5 0.393595        NA
#>  6 0.309097 0.4633195
#>  7 0.826368 0.5074730
#>  8 0.729424 0.5794351
#>  9 0.317649 0.5804980
#> 10 0.599793 0.5593651
#> # ... with 390 more rows

## example with multiple, using mapply - basically just applying rollapply...
mapply(function(x, n) {
  rollapply(x, n, mean, fill = NA, align = "right")
}, list(out), list(1, 2, 4, 6, 8, 12)) %>% tbl_df()
#> # A tibble: 400 x 6
#>          V1        V2        V3        V4        V5    V6
#>       <dbl>     <dbl>     <dbl>     <dbl>     <dbl> <dbl>
#>  1 0.000031        NA        NA        NA        NA    NA
#>  2 0.044495 0.0222630        NA        NA        NA    NA
#>  3 0.822440 0.4334675        NA        NA        NA    NA
#>  4 0.322291 0.5723655 0.2973143        NA        NA    NA
#>  5 0.393595 0.3579430 0.3957053        NA        NA    NA
#>  6 0.309097 0.3513460 0.4618558 0.3153248        NA    NA
#>  7 0.826368 0.5677325 0.4628377 0.4530477        NA    NA
#>  8 0.729424 0.7778960 0.5646210 0.5672025 0.4309676    NA
#>  9 0.317649 0.5235365 0.5456345 0.4830707 0.4706699    NA
#> 10 0.599793 0.4587210 0.6183085 0.5293210 0.5400821    NA
#> # ... with 390 more rows

## with lapply - probably more appropriate
lapply(list(1, 2, 4, 6, 8, 12)
, FUN = function(x, n) {
  return(rollapply(x, n, mean, fill = NA, align = "right"))
}, x = out) %>% setNames(., paste0("v", 1:6)) %>% do.call(bind_cols, .)
#> # A tibble: 400 x 6
#>          v1        v2        v3        v4        v5    v6
#>       <dbl>     <dbl>     <dbl>     <dbl>     <dbl> <dbl>
#>  1 0.000031        NA        NA        NA        NA    NA
#>  2 0.044495 0.0222630        NA        NA        NA    NA
#>  3 0.822440 0.4334675        NA        NA        NA    NA
#>  4 0.322291 0.5723655 0.2973143        NA        NA    NA
#>  5 0.393595 0.3579430 0.3957053        NA        NA    NA
#>  6 0.309097 0.3513460 0.4618558 0.3153248        NA    NA
#>  7 0.826368 0.5677325 0.4628377 0.4530477        NA    NA
#>  8 0.729424 0.7778960 0.5646210 0.5672025 0.4309676    NA
#>  9 0.317649 0.5235365 0.5456345 0.4830707 0.4706699    NA
#> 10 0.599793 0.4587210 0.6183085 0.5293210 0.5400821    NA
#> # ... with 390 more rows

One last note - I definitely recommend avoiding . in your variable names, as the period is used for S3 class dispatch (and it has been painful to remove the periods out of all of my code written before I knew that handy tidbit). Further reading on style

Upvotes: 2

pogibas
pogibas

Reputation: 28379

As you wanted alternative efficient way, here's solution using parallel. I'm applying rollapply on N vector (no reason to loop), but instead of usual apply we're running those things in parallel.

# Packages
library(parallel)
library(zoo)
# Input
N     <- 1:4
Price <- 1:10
# Main computation
res <- mclapply(N, function(i) 
                   data.frame(i, rollapply(Price, i, mean)))
# Final result
do.call("rbind", res)

Upvotes: 2

Related Questions