Adding missing data conditional on grouping variables within data frame

Question

Here's a four column df. Site, Visit, Ind(ividual), Obs(erved)

site<-c(rep("X",6),rep("Y",8),rep("Z",4))
visit<-c(1,1,2,2,3,3,1,1,2,2,3,3,4,4,1,1,2,2)
ind<-c(rep(c("a","b"),9))    
obs<-1    
dat<-as.data.frame(cbind(site,visit,ind,obs))

In this example I have three sites with unequal visits (X=3, Y=4, Z=2). I would like to add visits for site X and Z that "did not occur" for both individuals (a and b) and have a NA's in the observed column. As in this example:

site<-c(rep("X",8),rep("Y",8),rep("Z",8))
visit<-c(1,1,2,2,3,3,4,4,1,1,2,2,3,3,4,4,1,1,2,2,3,3,4,4)
ind<-c(rep(c("a","b"),12))    
obs<-c(rep(1,6),NA,NA,rep(1,12),rep(NA,4))    
dat2<-as.data.frame(cbind(site,visit,ind,obs))

This is a much simpler version of a very large data set with 500+ sites and 300+ individuals. I'm struggling with a quick way to accomplish what I'm striving for. Simple solutions out there? Thanks.

Also can anyone think of a better title for this post?

Jaehyeon Kim · Accepted Answer

I'd do as shown below. All possible matches of site, ind, visit are created by expand.grid(). Then data is joined to it, which is left outer join that keeps all the expanded values (all.x = TRUE)

# your data
site<-c(rep("X",6),rep("Y",8),rep("Z",4))
visit<-c(1,1,2,2,3,3,1,1,2,2,3,3,4,4,1,1,2,2)
ind<-c(rep(c("a","b"),9))
obs <- 1
dat<-as.data.frame(cbind(site,visit,ind,obs))
# all matches of site, ind, visit
site <- c("X", "Y", "Z")
ind <- c("a", "b")
visit <- c(1, 2, 3, 4)
grid <- expand.grid(site = site, ind = ind, visit = visit)
# merge - left outer join that keeps all grid values
merge(grid, dat, by = c("site", "ind", "visit"), all.x = TRUE)
   site ind visit  obs
1     X   a     1    1
2     X   a     2    1
3     X   a     3    1
4     X   a     4 
5     X   b     1    1
6     X   b     2    1
7     X   b     3    1
8     X   b     4 
9     Y   a     1    1
10    Y   a     2    1
11    Y   a     3    1
12    Y   a     4    1
13    Y   b     1    1
14    Y   b     2    1
15    Y   b     3    1
16    Y   b     4    1
17    Z   a     1    1
18    Z   a     2    1
19    Z   a     3 
20    Z   a     4 
21    Z   b     1    1
22    Z   b     2    1
23    Z   b     3 
24    Z   b     4

Adding missing data conditional on grouping variables within data frame

Answers (1)

Related Questions