Reputation: 2570
I have two data.frame
s, df
and wf
.
df
has a one row per time point for each id
. Some time points (tpoint
) are missing for each id
.
My second data.frame
, wf
, has the appropriate beginning and end tpoint
s for each id
i.e. spoint
and epoint
respectively.
So I want to fill in missing rows in df
for the missing tpoints
. Below are the data.frames
df <- read.table(text= "id Gid tpoint dat1 dat2 dat3
1 a 1 x x 55
1 a 3 x x 44
1 a 4 x x 33
2 a 2 x x 66
2 a 3 x x 43
3 b 4 x x 42
3 b 5 x x 36
4 b 4 x x 33
4 b 5 x x 65
4 b 6 x x 77
5 b 4 x x 72
5 b 5 x x 25
5 b 6 x x 12
5 b 7 x x 09",header=TRUE)
wf <- read.table(text= "id Gid spoint epoint
1 a 1 5
2 a 1 4
3 b 4 6
4 b 4 7
5 b 4 7",header=TRUE)
I figured out a way to do this below:
library(plyr)
seqlist <- apply(wf, 1, function(x) data.frame( id=x[1],
Gid=x[2],
tpoint = seq(x[3], x[4])))
# bunch of warnings but I get the result
seqdf <- ldply(seqlist, data.frame)
finaldf <- merge(seqdf, df, by=c("Gid", "id", "tpoint"), all=TRUE)
I get a bunch of ugly warnings although I get where I want to be. But I guess all warnings should be suppressed. There are infinite ways to skin a cat in R
. Is there a much better way to be doing this I am missing?
Upvotes: 2
Views: 470
Reputation: 179388
The errors occur because:
apply()
the data frame gets coerced to an array, in this case a character array)To remove the warnings, try this:
seqlist <- apply(wf, 1, function(x){
n <- as.numeric(x[4])-as.numeric(x[3])+1
data.frame( id=rep(x[1], n), Gid=rep(x[2], n), tpoint = x[3]:x[4])
})
seqlist
[[1]]
id Gid tpoint
1 1 a 1
2 1 a 2
3 1 a 3
4 1 a 4
5 1 a 5
[[2]]
id Gid tpoint
1 2 a 1
2 2 a 2
3 2 a 3
4 2 a 4
[[3]]
id Gid tpoint
1 3 b 4
2 3 b 5
3 3 b 6
[[4]]
id Gid tpoint
1 4 b 4
2 4 b 5
3 4 b 6
4 4 b 7
[[5]]
id Gid tpoint
1 5 b 4
2 5 b 5
3 5 b 6
4 5 b 7
Upvotes: 1