Reputation: 5059
I have data that often contains missing observations between time periods. I want to fill in those observations, properly incrementing the time periods, but conditional on the values of the observations. Here's an example:
df <- data.frame(id=c("a","a","b","b"), group=c("x","x","y","z"), year=c(2000,2003,2003,2005))
Which gives the 4 observation data frame
id group year
1 a x 2000
2 a x 2003
3 b y 2003
4 b z 2005
I would like to have 2 additional observations here (between #1 and #2) for 2001 and 2002, since observation #1 and #2 match on id and group. But I don't want additional observation between #3 and #4 because the id and group do not match.
Upvotes: 1
Views: 134
Reputation: 887911
Or using data.table
library(data.table)
setDT(df)[, .(year = year[1]:year[.N]), .(id, group)]
# id group year
#1: a x 2000
#2: a x 2001
#3: a x 2002
#4: a x 2003
#5: b y 2003
#6: b z 2005
Upvotes: 1
Reputation: 28379
You can use full_seq
from tidyr
- it was created exactly for tasks like this (Create the full sequence of values in a vector):
library(tidyr)
library(dplyr)
df %>%
group_by(id, group) %>%
complete(year = full_seq(year, period = 1))
id group year
<fct> <fct> <dbl>
1 a x 2000
2 a x 2001
3 a x 2002
4 a x 2003
5 b y 2003
6 b z 2005
Upvotes: 3