sharoz
sharoz

Reputation: 6345

Find the variance over a sliding window in dplyr

I want to find the variance of the previous three values in a group.

# make some data with categories a and b
library(dplyr)
df = expand.grid(
  a = LETTERS[1:3],
  index = 1:10
)
# add a variable that changes within each group
set.seed(9999)
df$x = runif(nrow(df))

# get the variance of a subset of x
varSubset = function(x, index, subsetSize) {
  subset = (index-subsetSize+1):index
  ifelse(subset[1]<1, -1, var(x[subset]))
}

df %>%
  # group the data
  group_by(a) %>%
  # get the variance of the 3 most recent values
  mutate(var3 = varSubset(x, index, 3))

It's calling the varSubset with both x and index as vectors.

I can't figure out how to treat x as a vector (of only the group) and index as a single value. I've tried rowwise(), but then I effectively lose grouping.

Upvotes: 1

Views: 1543

Answers (1)

jeremycg
jeremycg

Reputation: 24945

Why not use rollapply from zoo?:

library(dplyr)

library(zoo)
df %>% group_by(a) %>%
       mutate(var = rollapply(x, 3, var, fill = NA, align = "right"))

Upvotes: 2

Related Questions