bshor
bshor

Reputation: 5059

Impute/ fill in missing values between time periods

I have data that often contains missing observations between time periods. I want to fill in those observations, properly incrementing the time periods, but conditional on the values of the observations. Here's an example:

df <- data.frame(id=c("a","a","b","b"), group=c("x","x","y","z"), year=c(2000,2003,2003,2005))

Which gives the 4 observation data frame

  id group year
1  a     x 2000
2  a     x 2003
3  b     y 2003
4  b     z 2005

I would like to have 2 additional observations here (between #1 and #2) for 2001 and 2002, since observation #1 and #2 match on id and group. But I don't want additional observation between #3 and #4 because the id and group do not match.

Upvotes: 1

Views: 134

Answers (2)

akrun
akrun

Reputation: 887911

Or using data.table

library(data.table)
setDT(df)[, .(year = year[1]:year[.N]), .(id, group)]
#   id group year
#1:  a     x 2000
#2:  a     x 2001
#3:  a     x 2002
#4:  a     x 2003
#5:  b     y 2003
#6:  b     z 2005

Upvotes: 1

pogibas
pogibas

Reputation: 28379

You can use full_seq from tidyr - it was created exactly for tasks like this (Create the full sequence of values in a vector):

library(tidyr)
library(dplyr)
df %>%
  group_by(id, group) %>%
  complete(year = full_seq(year, period = 1))

  id    group  year
  <fct> <fct> <dbl>
1 a     x      2000
2 a     x      2001
3 a     x      2002
4 a     x      2003
5 b     y      2003
6 b     z      2005

Upvotes: 3

Related Questions