sa90210
sa90210

Reputation: 585

How to calculate rolling standard deviations in dplyr

I have the following data frame in R.

data <- structure(list(Date = structure(c(18682, 18683, 18684, 18687, 
18688), class = "Date"), Apple = c(125.349998, 120.989998, 121.260002, 
127.790001, 125.120003), Amazon = c(3159.530029, 3057.159912, 
3092.929932, 3146.139893, 3094.530029), Facebook = c(264.309998, 
254.690002, 257.619995, 264.910004, 259), Google = c(2083.810059, 
2015.949951, 2021.910034, 2069.659912, 2064.47998), Netflix = c(553.409973, 
546.700012, 538.849976, 550.640015, 547.820007)), row.names = c(NA, 
-5L), class = c("tbl_df", "tbl", "data.frame"))

I'm looking for an easy way to calculate rolling standard deviations on each column in dplyr. If my dataframe was a zoo object, the solution could probably look something like this

library(tidyverse)
data %>% mutate_at(.vars = vars(2:6), .funs = ~zoo::rollapply(., width = 2, FUN = sd))

Any ideas on how I can tweak the .funs command to work on a tbl object?

Upvotes: 1

Views: 2201

Answers (3)

Simon Woodward
Simon Woodward

Reputation: 2026

Here's a method using zoo:

  library(zoo)

  rollsd <- function(x, ...){
    n <- rollsum(!is.na(x), ...)
    s <- rollsum(x, ...)
    ss <- rollsum(x * x, ...)
    v <- (ss - s * s / n) / (n - 1)
    return(sqrt(v))
  }
  rollsd(1:10, k = 10, align = "right", na.pad = TRUE)
  sd(1:10)

Upvotes: 0

G. Grothendieck
G. Grothendieck

Reputation: 270055

Here are several possibilities. Note that it may be that what you want is rollapplyr with an r on the end (denoting right window alignment rather than the default of center alignment) using the same arguments but we can't tell without a full explanation in the question of what is wanted.

1a) The code in the question works if you add fill=NA to the rollapply call although mutate_at is deprecated in favor of across.

1b) to use mutate/across use this instead.

library(dplyr)
library(zoo)

data %>% mutate(across(-Date, ~ rollapply(., 2, sd, fill = NA)))

1c) Another possibility is to use %<>% like this:

library(magrittr)
library(zoo)

data2 <- data
data2[-1] %<>% rollapply(2, sd, fill = NA)

2) convert to zoo in which case the code is just this because zoo objects don't require NA filling like data frames do and since the dates are not represented as the first column we don't need to use special measures to avoid applying to it.

library(zoo)
z <- read.zoo(data)
rollapply(z, 2, sd)

Upvotes: 1

Jon Spring
Jon Spring

Reputation: 66880

I like the slider package for these:

data %>%
  mutate(across(-Date, ~slider::slide_dbl(.x, sd, .before = Inf)))
  # or use `.before = 2` if you want to look back at the
  #  two prior values (3 in total)

# A tibble: 5 x 6
  Date       Apple Amazon Facebook Google Netflix
  <date>     <dbl>  <dbl>    <dbl>  <dbl>   <dbl>
1 2021-02-24 NA      NA      NA      NA     NA   
2 2021-02-25  3.08   72.4     6.80   48.0    4.74
3 2021-02-26  2.44   52.0     4.93   37.6    7.29
4 2021-03-01  3.30   47.5     5.03   34.0    6.33
5 2021-03-02  2.91   42.1     4.40   30.3    5.49

slider also lets you size the window by an index column, so you could calculate "standard deviation over the last 2 days" (vs. last 2 observations) by using slider::slide_index_dbl().

Upvotes: 3

Related Questions