Stefano Potter
Stefano Potter

Reputation: 3577

dplyr fill missing time series values with NA and by group

I have a data frame like so:

library(tidyverse)

#make some data
df <- tibble(ID = c(1, 1, 2, 2),
            Year = c(2000, 2003, 2000, 2003),
             Value = c(1, 1, 1, 1))

     ID  Year Value
  <dbl> <dbl> <dbl>
1     1  2000     1
2     1  2003     1
3     2  2000     1
4     2  2003     1

Which is missing the year 2001, 2002, 2004 and 2005. I would like to groupby the ID column and fill the Value column with NaN. My expected output is:

wanted <- tibble(ID = c(1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2),
                 Year = c(2000, 2001, 2002, 2003, 2004, 2005, 2000, 2001, 2002, 2003, 2004, 2005),
                 Value = c(1, NaN, NaN, 1, NaN, NaN, 1, NaN, NaN, 1, NaN, NaN))

      ID  Year Value
   <dbl> <dbl> <dbl>
 1     1  2000     1
 2     1  2001   NaN
 3     1  2002   NaN
 4     1  2003     1
 5     1  2004   NaN
 6     1  2005   NaN
 7     2  2000     1
 8     2  2001   NaN
 9     2  2002   NaN
10     2  2003     1
11     2  2004   NaN
12     2  2005   NaN

I have looked into the complete and fill functions within the tidyverse, but I can't seem to quite get it.

Ideally I would like to give a sequence I would prefer in the Year column, and then fill all missing years in the Value column with NaN. I have only supplied a simplified example here. In this case the wanted sequence would be seq(2000, 2005, 1).

Upvotes: 1

Views: 393

Answers (1)

www
www

Reputation: 39154

We can use complete function to do the job.

library(tidyverse)

df2 <- df %>%
  group_by(ID) %>%
  complete(Year = full_seq(Year, period = 1), fill = list(Value = NaN)) %>%
  ungroup()

Upvotes: 3

Related Questions