TIm Haus
TIm Haus

Reputation: 261

rollsumr with window-length>1: filling missing values

My data frame looks something like the first two columns of the following

I want to add a third column, equal to the sum of the ID-group's last three observations for VAL.

Using the following command, I managed to get the output below:

df %>% 
  group_by(ID) %>% 
  mutate(SUM=rollsumr(VAL, k=3)) %>% 
  ungroup()

ID VAL  SUM
1  2    NA
1  1    NA
1  3    6
1  4    8
...

I am now hoping to be able to fill the NAs that result for the group's cells in the first two rows.

ID VAL  SUM
1  2    2
1  1    3
1  3    6
1  4    8
...

How do I do that?

I have tried doing the following

df %>% 
  group_by(ID) %>% 
  mutate(SUM=rollsumr(VAL, k=min(3, row_number())) %>% 
  ungroup()

and

df %>% 
  group_by(ID) %>% 
  mutate(SUM=rollsumr(VAL, k=3), fill = "extend") %>% 
  ungroup()

But both give me the same error, because I have groups of sizes <= 2.

Evaluation error: need at least two non-NA values to interpolate.

What do I do?

Upvotes: 3

Views: 1102

Answers (2)

tmfmnk
tmfmnk

Reputation: 40171

Alternatively, you can use rollapply() from the same package:

df %>% 
 group_by(ID) %>% 
 mutate(SUM = rollapply(VAL, width = 3, FUN = sum, partial = TRUE, align = "right"))

     ID   VAL   SUM
  <int> <int> <int>
1     1     2     2
2     1     1     3
3     1     3     6
4     1     4     8

Due to argument partial = TRUE, also the rows that are below the desired window of length three are summed.

Upvotes: 3

Ronak Shah
Ronak Shah

Reputation: 389235

Not a direct answer but one way would be to replace the values which are NAs with cumsum of VAL

library(dplyr)
library(zoo)

df %>% 
  group_by(ID) %>% 
  mutate(SUM = rollsumr(VAL, k=3, fill = NA), 
         SUM = ifelse(is.na(SUM), cumsum(VAL), SUM))

#     ID   VAL   SUM
#  <int> <int> <int>
#1     1     2     2
#2     1     1     3
#3     1     3     6
#4     1     4     8

Or since you know the window size before hand, you could check with row_number() as well

df %>% 
  group_by(ID) %>% 
  mutate(SUM = rollsumr(VAL, k=3, fill = NA), 
         SUM = ifelse(row_number() < 3, cumsum(VAL), SUM))

Upvotes: 2

Related Questions