Reputation: 959
I have several data frames I would like to combine, but I need to get rid of rows that don't have matching values in a column in the other data frames. For example, I want to merge a, b, and c data frames, based on the values in column x.
a <- data.frame(1:5, 5:9)
colnames(a) <- c("x", "y")
b <- data.frame(1:4, 7:10)
colnames(b) <- c("x", "y")
c <- data.frame(1:3, 6:8)
colnames(c) <- c("x", "y")
and have the result be
1 5
2 6
3 7
1 7
2 8
3 9
1 6
2 7
3 8
where the first three rows are from data frame a, the second three rows are from data frame b, and the third three rows are from data frame c, and the rows that didn't have matching values in column x were not included.
Upvotes: 1
Views: 1498
Reputation: 887048
We create an index based on intersect
ing elements of 'x'
v1 <- Reduce(intersect, list(a$x, b$x, c$x))
rbind(a[a$x %in% v1,], b[b$x %in% v1,], c[c$x %in% v1, ])
# x y
#1 1 5
#2 2 6
#3 3 7
#4 1 7
#5 2 8
#6 3 9
#7 1 6
#8 2 7
#9 3 8
If there are many dataset objects, it is better to keep it in a list
. Here, the example showed the object identifiers as completely different, but if the identifiers have a pattern e.g. df1, df2, ..df100
etc, it becomes easier to get it to a list
lst1 <- mget(ls(pattern = "^df\\d+$"))
If the object identifiers are all different xyz, abc, fq12
etc, but these are the only data.frame objects loaded in the global environment
lst1 <- mget(names(eapply(.GlobalEnv, 'is.data.frame')))
Then, get the interesecitng elements of the column 'x'
v1 <- Reduce(intersect, lapply(lst1, `[[`, "x"))
Use the intersecting vector to subset the rows of the list
elements
do.call(rbind, lapply(lst1, function(x) dat[dat$x %in% v1,]))
Here, we assume the column names are the same across all the datasets
Another option is to do a merge
and then unlist
out <- Reduce(function(...) merge(..., by = 'x'), list(a, b, c))
data.frame(x = out$x, y = unlist(out[-1], use.name = FALSE))
Upvotes: 3