R Dplyr: Adding Missing Rows into Gaps in a Column of Integers

Question

I am using the dplyr package in R to "summarise" some input data.

Step 1: Count the number of records (N), on a given date (Date), with a certain index value (Idx) in additional to another categorical value in column A.

T <- orig_data %>% group_by(A, Date, Idx) %>% summarise(N=n())

Step 2: Create a cumlative sum of the number of counts on a given date with a given Idx value or above.

T2 <- T %>% mutate(cN=cumsum(N))

However, some values of Idx between its maximum and minimum are not present in the table, e.g. Idx=10 is missing in the example below.

A   Date        Idx N   cN
N   2020-10-01  8   18  85
N   2020-10-01  9   6   91
N   2020-10-01  11  1   92
N   2020-10-01  13  10  102

I would like to add a row for each missing Idx value, on each day when it is missing, e.g.

N   2020-10-01  10  0   91

I tried using the simplest possible version of the fill function to attempt this, but it produced no noticeable change in the results (maybe because Idx is being interpreted as a list and not a vector?)

T3 <- T2 %>% fill(cN)

Is there a way to easily add the above row, and identify any other missing rows and add them to the table?

www · Accepted Answer

A solution using tidyr's functions: complete and nesting. I saw you commented in another post saying you have tried the complete function, but it is slow. Try nesting the columns to see if that helps.

library(dplyr)
library(tidyr)

dat2 <- dat %>%
  complete(nesting(A, Date), Idx = full_seq(Idx, period = 1), fill = list(N = 0)) %>%
  fill(cN)
dat2
# # A tibble: 6 x 5
#   A     Date         Idx     N    cN
#            
# 1 N     2020-10-01     8    18    85
# 2 N     2020-10-01     9     6    91
# 3 N     2020-10-01    10     0    91
# 4 N     2020-10-01    11     1    92
# 5 N     2020-10-01    12     0    92
# 6 N     2020-10-01    13    10   102

R Dplyr: Adding Missing Rows into Gaps in a Column of Integers

Answers (2)

Related Questions