Reputation: 35
I need to apply a rolling function to a large dataset, but it needs to apply to grouped windows specified in group_by
. Passing the function to mutate
returns a "subscript out of bounds" error on some subsets of the data but works on others. I can't provide my data as it is confidential but I found a similar enough reproducible example.
I have tried tracing the error which led me to believe it is due to the function returning a matrix data type (1d). Coercing it to dataframe or vector did not work.
library(tidyverse)
library(caTools)
# Works
i <- iris %>%
group_by(Species)%>%
arrange(Sepal.Length,.by_group=T)%>%
mutate(q_low = runquantile(lag(Sepal.Length),3,probs=0.2,endrule = 'NA',align='right'))
# Gives error
# Grouping by and arranging by more than one variable
m <- mtcars %>%
group_by(cyl,gear)%>%
arrange(gear,.by_group=T)%>%
mutate(q_low = runquantile(lag(drat),3,probs=0.2,endrule = 'NA',align='right'))
#> Error in `[<-`(`*tmp*`, (k2 + 1):n, , value = y[1:(n - k2), ]): subscript out of bounds
Created on 2019-11-05 by the reprex package (v0.3.0)
Expected a similar result to the Iris example with the function being applied to the grouping and the NA's in the right spot. In actuality I get the error.
Upvotes: 2
Views: 2385
Reputation: 887241
in the first example, all the groups have number of rows equal to 50, but in second case, it is not true, some of them are having number of rows as 1.
runquantile(rnorm(4), 3, probs =0.2, endrule = 'NA',align='right')
#[1] NA NA -0.5466295 -0.4099716
Above works, but if the number of elements is less than 'k'
runquantile(rnorm(1), 3, probs =0.2, endrule = 'NA',align='right')
Error in
[<-
(*tmp*
, (k2 + 1):n, , value = y[1:(n - k2), ]) :
subscript out of bounds
We need to take care of those cases with a if/else
condition
library(dplyr)
library(caTools)
mtcars %>%
group_by(cyl,gear)%>%
arrange(gear,.by_group=TRUE) %>%
mutate(q_low = if(n() < 3) NA_real_ else
runquantile(lag(drat),3,probs=0.2,endrule = 'NA',align='right'))
Upvotes: 1