Lisa
Lisa

Reputation: 959

rbind data based on matching values in a column

I have several data frames I would like to combine, but I need to get rid of rows that don't have matching values in a column in the other data frames. For example, I want to merge a, b, and c data frames, based on the values in column x.

a <- data.frame(1:5, 5:9)
colnames(a) <- c("x", "y")
b <- data.frame(1:4, 7:10)
colnames(b) <- c("x", "y")
c <- data.frame(1:3, 6:8)
colnames(c) <- c("x", "y")

and have the result be

1   5
2   6
3   7
1   7
2   8
3   9
1   6
2   7
3   8

where the first three rows are from data frame a, the second three rows are from data frame b, and the third three rows are from data frame c, and the rows that didn't have matching values in column x were not included.

Upvotes: 1

Views: 1498

Answers (1)

akrun
akrun

Reputation: 887048

We create an index based on intersecting elements of 'x'

v1 <- Reduce(intersect, list(a$x, b$x, c$x))
rbind(a[a$x %in% v1,], b[b$x %in% v1,], c[c$x %in% v1, ])
#  x y
#1 1 5
#2 2 6
#3 3 7
#4 1 7
#5 2 8
#6 3 9
#7 1 6
#8 2 7
#9 3 8

If there are many dataset objects, it is better to keep it in a list. Here, the example showed the object identifiers as completely different, but if the identifiers have a pattern e.g. df1, df2, ..df100 etc, it becomes easier to get it to a list

lst1 <- mget(ls(pattern = "^df\\d+$"))

If the object identifiers are all different xyz, abc, fq12 etc, but these are the only data.frame objects loaded in the global environment

lst1 <-  mget(names(eapply(.GlobalEnv, 'is.data.frame')))

Then, get the interesecitng elements of the column 'x'

v1 <- Reduce(intersect, lapply(lst1, `[[`, "x"))

Use the intersecting vector to subset the rows of the list elements

do.call(rbind, lapply(lst1, function(x) dat[dat$x %in% v1,]))

Here, we assume the column names are the same across all the datasets


Another option is to do a merge and then unlist

out <- Reduce(function(...) merge(..., by = 'x'), list(a, b, c))
data.frame(x = out$x, y = unlist(out[-1], use.name = FALSE))

Upvotes: 3

Related Questions