Reputation: 965
I am trying to extract rows of a dataset based on a list of time points nested within individuals. I have repeated time points (therefore exactly the same variable values) but I still want to keep the duplicated rows. How to achieve that in base R?
Here is the original dataset:
xx <- data.frame(id=rep(1:3, each=3), time=1:3, y=rep(1:3, each=3))
Here is the list of matrices where the third one is a vector
lst <- list(`1` = c(1, 1, 2), `2` = c(1, 3, 3), `3` = c(2, 2, 3))
Desirable outcome:
id time y
1 1 1
1 1 1 #this is the duplicated row
1 2 1
2 1 2
2 3 2
2 3 2 #this is the duplicated row
3 2 3
3 2 3 #this is the duplicated row
3 3 3
The code do.call(rbind, Map(function(p, q) subset(xx, id == q & time %in% p), lst, names(lst)))
did not work for me because subset
removes duplicated rows
Upvotes: 1
Views: 132
Reputation: 160397
The issue is that %in%
doesn't iterate over the non-unique values repeatedly. To do so, we need to also iterate (lapply
) over p
internally. I'll wrap your inner subset
in another do.call(rbind, lapply(p, ...))
to get what you expect:
do.call(rbind, Map(function(p, q) {
do.call(rbind, lapply(p, function(p0) subset(xx, id == q & time %in% p0)))
}, lst, names(lst)))
# id time y
# 1.1 1 1 1
# 1.2 1 1 1
# 1.21 1 2 1
# 2.4 2 1 2
# 2.6 2 3 2
# 2.61 2 3 2
# 3.8 3 2 3
# 3.81 3 2 3
# 3.9 3 3 3
(Row names are a distraction here ...)
An alternative would be to convert your lst
into a frame of id
and time
, and then left-join on it:
frm <- do.call(rbind, Map(function(x, nm) data.frame(id = nm, time = x), lst, names(lst)))
frm
# id time
# 1.1 1 1
# 1.2 1 1
# 1.3 1 2
# 2.1 2 1
# 2.2 2 3
# 2.3 2 3
# 3.1 3 2
# 3.2 3 2
# 3.3 3 3
merge(frm, xx, by = c("id", "time"), all.x = TRUE)
# id time y
# 1 1 1 1
# 2 1 1 1
# 3 1 2 1
# 4 2 1 2
# 5 2 3 2
# 6 2 3 2
# 7 3 2 3
# 8 3 2 3
# 9 3 3 3
Two good resources for learning about merges/joins:
Upvotes: 2