user1320502
user1320502

Reputation: 2570

fill-in absent rows in data.frame

I have two data.frames, df and wf.

df has a one row per time point for each id. Some time points (tpoint) are missing for each id.

My second data.frame, wf, has the appropriate beginning and end tpoints for each id i.e. spoint and epoint respectively.

So I want to fill in missing rows in df for the missing tpoints. Below are the data.frames

df <- read.table(text= "id Gid tpoint dat1 dat2 dat3
                     1   a    1     x     x  55
                     1   a    3     x     x  44
                     1   a    4     x     x  33
                     2   a    2     x     x  66
                     2   a    3     x     x  43
                     3   b    4     x     x  42
                     3   b    5     x     x  36
                     4   b    4     x     x  33
                     4   b    5     x     x  65
                     4   b    6     x     x  77
                     5   b    4     x     x  72
                     5   b    5     x     x  25
                     5   b    6     x     x  12
                     5   b    7     x     x  09",header=TRUE)

 wf <- read.table(text= "id Gid spoint epoint
                     1   a    1     5
                     2   a    1     4
                     3   b    4     6
                     4   b    4     7
                     5   b    4     7",header=TRUE)

I figured out a way to do this below:

library(plyr)

seqlist  <- apply(wf, 1, function(x) data.frame( id=x[1], 
                                                 Gid=x[2],
                                                 tpoint = seq(x[3], x[4])))
# bunch of warnings but I get the result

seqdf    <- ldply(seqlist, data.frame)
finaldf  <- merge(seqdf, df, by=c("Gid", "id", "tpoint"), all=TRUE)

I get a bunch of ugly warnings although I get where I want to be. But I guess all warnings should be suppressed. There are infinite ways to skin a cat in R. Is there a much better way to be doing this I am missing?

Upvotes: 2

Views: 470

Answers (1)

Andrie
Andrie

Reputation: 179388

The errors occur because:

  1. In the call to apply() the data frame gets coerced to an array, in this case a character array)
  2. This means each row is now a named character vector
  3. When coercing the named vector to a data frame, R lets you know that it is discarding all the names.

To remove the warnings, try this:

seqlist  <- apply(wf, 1, function(x){
  n <- as.numeric(x[4])-as.numeric(x[3])+1
  data.frame( id=rep(x[1], n), Gid=rep(x[2], n), tpoint = x[3]:x[4])
})

seqlist
[[1]]
  id Gid tpoint
1  1   a      1
2  1   a      2
3  1   a      3
4  1   a      4
5  1   a      5

[[2]]
  id Gid tpoint
1  2   a      1
2  2   a      2
3  2   a      3
4  2   a      4

[[3]]
  id Gid tpoint
1  3   b      4
2  3   b      5
3  3   b      6

[[4]]
  id Gid tpoint
1  4   b      4
2  4   b      5
3  4   b      6
4  4   b      7

[[5]]
  id Gid tpoint
1  5   b      4
2  5   b      5
3  5   b      6
4  5   b      7

Upvotes: 1

Related Questions