sg15
sg15

Reputation: 35

Subscript out of bounds error due to passing mutate (dplyr) a matrix

I need to apply a rolling function to a large dataset, but it needs to apply to grouped windows specified in group_by . Passing the function to mutate returns a "subscript out of bounds" error on some subsets of the data but works on others. I can't provide my data as it is confidential but I found a similar enough reproducible example.

I have tried tracing the error which led me to believe it is due to the function returning a matrix data type (1d). Coercing it to dataframe or vector did not work.

library(tidyverse)
library(caTools)

# Works
i <- iris %>%
  group_by(Species)%>% 
  arrange(Sepal.Length,.by_group=T)%>%
  mutate(q_low = runquantile(lag(Sepal.Length),3,probs=0.2,endrule = 'NA',align='right'))

# Gives error 
# Grouping by and arranging by more than one variable
m <- mtcars %>%
  group_by(cyl,gear)%>% 
  arrange(gear,.by_group=T)%>%
  mutate(q_low = runquantile(lag(drat),3,probs=0.2,endrule = 'NA',align='right'))
#> Error in `[<-`(`*tmp*`, (k2 + 1):n, , value = y[1:(n - k2), ]): subscript out of bounds

Created on 2019-11-05 by the reprex package (v0.3.0)

Expected a similar result to the Iris example with the function being applied to the grouping and the NA's in the right spot. In actuality I get the error.

Upvotes: 2

Views: 2385

Answers (1)

akrun
akrun

Reputation: 887241

in the first example, all the groups have number of rows equal to 50, but in second case, it is not true, some of them are having number of rows as 1.

runquantile(rnorm(4), 3, probs =0.2, endrule = 'NA',align='right')
#[1]         NA         NA -0.5466295 -0.4099716

Above works, but if the number of elements is less than 'k'

runquantile(rnorm(1), 3, probs =0.2, endrule = 'NA',align='right') 

Error in [<-(*tmp*, (k2 + 1):n, , value = y[1:(n - k2), ]) :
subscript out of bounds


We need to take care of those cases with a if/else condition

library(dplyr)
library(caTools)
mtcars %>%
  group_by(cyl,gear)%>% 
  arrange(gear,.by_group=TRUE) %>% 
  mutate(q_low = if(n() < 3) NA_real_ else 
      runquantile(lag(drat),3,probs=0.2,endrule = 'NA',align='right'))

Upvotes: 1

Related Questions