Dugong
Dugong

Reputation: 77

Centered moving average in R (without using packages)

I have been constructing a function for centered moving average in R (without using any packages), and have encountered a challenge as below:

As you know, the centered moving average includes the concept of incorporating the 'incomplete portions' (i.e. at the beginning and the end of the datapoint). For example, consider below vector p:

p <- c(10,20,30,40,50,60,70,80,90)

In this case, centered moving average that I am interested in looks like this:

x <- ((10+20)/2, (10+20+30)/3, (20+30+40)/3 ..... (70+80+90)/3, (80+90)/2)

To achieve above, I tried function with if function as below:

wd means window size

mov_avg <- function(p, wd) {
  x <- c(0, cumsum(p))
  if ((p > p[1])&(p < p[length(p)])) {
    neut <- 1:(length(p)-(wd-1))
    upper <- neut+(wd-1)
    x <- (x[upper]-x[neut])/(upper-neut)
  } else if (p==p[1]) {
    neut <- 0
    upper <- neut+3
    x <- (x[upper]-x[neut])/(upper-1-neut)
  } else if (p==p[length(p)]) {
    upper <-(length(p)+1)
    neut <- (length(p)-(wd-2))
    x <- (x[upper]-x[neut])/(upper-neut)
  }
  return(x)
}

Then I entered below line to execute:

mov_avg(p, 3)

I encountered errors as below:

numeric(0)
Warning messages:
1: In if ((p > p[1]) & (p < p[length(p)])) { :
  the condition has length > 1 and only the first element will be used
2: In if (p == p[1]) { :
  the condition has length > 1 and only the first element will be used

Could someone help me out in making this a working function?

Thank you!

Upvotes: 4

Views: 1574

Answers (5)

nisetama
nisetama

Reputation: 8923

This is a fast vectorized method (but it was still slower than Akrun's method in my benchmark):

> p=seq(10,90,10)
> o=outer(1:length(p),-1:1,"+")
> rowMeans(matrix(p[ifelse(o<1,NA,o)],length(p)),na.rm=T)
[1] 15 20 30 40 50 60 70 80 85

Benchmark:

v=sample(1e4)

vectorized=\(x,y){
  o=outer(1:length(x),y,"+")
  rowMeans(matrix(x[ifelse(o<1,NA,o)],length(x)),na.rm=T)
}

nonvectorized=\(x,y){
  l=length(x)
  sapply(1:l,\(i)mean(x[max(1,i-y):min(i+y,l)],na.rm=T))
}

mov_avg <- function(p, window) {
 mean_number = numeric()
 index = 1
 while(index < length(p)) {
   if (index == 1 | index == length(p) - 1)
    mean_number = c(mean_number, mean(p[index:(index + window - 2)]))
   else
    mean_number = c(mean_number, mean(p[index:(index + window - 1)]))
  index = index + 1
  }
  mean_number
}

bench=\(times,...){
  arg=match.call(expand.dots=F)$...;l=length(arg);out=double(times*l)
  rand=sample(rep(1:l,times))
  n=1;for(x in arg[rand]){t1=Sys.time();eval.parent(x);t2=Sys.time()-t1;out[n]=t2;n=n+1}
  setNames(out,sapply(arg[rand],\(x)gsub("  +"," ",paste(deparse(x),collapse=" "))))
}

bench(100,
  vectorized(v,-1:1),
  nonvectorized(v,1)
  rowMeans(embed(c(NA,v,NA),3),na.rm=T),
  apply(matrix(c(v,c((v[1]+v[2])/2,head(v,-1)),c(tail(v,-1),sum(tail(v,2))/2)),ncol=3),1,mean),
  mov_avg(v,3),
  zoo::rollapplyr(v,3,mean,align="center",partial=T),
  zoo::rollapply(c(NA,v,NA),3,mean,na.rm=T),
  slider::slide_dbl(v,mean,.before=1,.after=1)
)

o=sort(tapply(b,names(b),median))
writeLines(sprintf("%.1f %s",o/min(o),names(o)))

The output shows the median time of a hundred runs relative to the fastest option:

1.0 rowMeans(embed(c(NA, v, NA), 3), na.rm = T)
2.4 vectorized(v, -1:1)
69.1 slider::slide_dbl(v, mean, .before = 1, .after = 1)
75.1 apply(matrix(c(v, c((v[1] + v[2])/2, head(v, -1)), c(tail(v, -1), sum(tail(v, 2))/2)), ncol = 3), 1, mean)
96.8 nonvectorized(v, 1)
153.5 zoo::rollapply(c(NA, v, NA), 3, mean, na.rm = T)
167.6 zoo::rollapplyr(v, 3, mean, align = "center", partial = T)
393.0 mov_avg(v, 3)

Upvotes: 0

akrun
akrun

Reputation: 887741

We could also use rowMeans

rowMeans(embed(c(NA, p, NA),  3)[, 3:1], na.rm = TRUE)
#[1] 15 20 30 40 50 60 70 80 85

Upvotes: 3

IRTFM
IRTFM

Reputation: 263451

Take the mean by rows in a matrix with columns that are x, and the head and tail appended with the means respectively of the first two and last two elements.

apply( matrix( c(x, 
               c( x[1]+x[2])/2, head(x,-1) ),
               c( tail(x,-1), sum( tail(x,2))/2)  ),
               ncol = 3),
       1, mean)

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 389225

Another method is to create a function where we can adjust with variable windows

mov_avg <- function(p, window) {
 mean_number = numeric()
 index = 1
 while(index < length(p)) {
   if (index == 1 | index == length(p) - 1) 
    mean_number = c(mean_number, mean(p[index:(index + window - 2)]))
   else 
    mean_number = c(mean_number, mean(p[index:(index + window - 1)]))
  index = index + 1
  }
  mean_number
}

mov_avg(p, 3)
#[1] 15 30 40 50 60 70 80 85

mov_avg(p, 2)
#[1] 10 25 35 45 55 65 75 80

Upvotes: 1

Maurits Evers
Maurits Evers

Reputation: 50728

How about something like this in base R:

window <- 3
p <- c(10,20,30,40,50,60,70,80,90)

x <- c(NA, p, NA)
sapply(seq_along(x[-(1:(window - 1))]), function(i)
    mean(x[seq(i, i + window - 1)], na.rm = T))
#[1] 15 20 30 40 50 60 70 80 85

The trick is to add flanking NAs and then use mean with na.rm = T.


I know you said "without using packages", but the same is even shorter using zoo::rollapply

library(zoo)
rollapply(c(NA, p, NA), 3, mean, na.rm = T)
#[1] 15 20 30 40 50 60 70 80 85

Upvotes: 2

Related Questions