NewUsr_stat
NewUsr_stat

Reputation: 2571

merge 3 data.frames by column names

I have three independent data.frames. The three data.frames have the same number of columns and the same number of rows. Additionally They have the same column names. I' m trying to merge the three data.frames according to column names. I'm using the following code wrote to merge two data.frames and return the number of matches.

Merged_DF = sapply(names(DF1),function(n) nrow(merge(DF1, DF2, by=n)))

The problem is that while in this example there are two data.frames, in my case I have 3 data.frames. How can I modify the code to merge three data.frames instead of two? I tried to modify the string in this way simply adding the third data.frame but it does not work:

  Merged_DF = sapply(names(DF1),function(n) nrow(merge(DF1, DF2, DF3,  by=n)))

It returns the following error:

 Error in fix.by(by.x, x) :  'by' must specify column(s) as numbers, names or logical

Ex:

DF1

 G1  G2  G3
  a   b   f
  b   c   a
  c   d   b

DF2

 G1  G2  G3
  A   b   f
  b   c   a
  h   M   b

DF3

 G1  G2  G3
  a   b   f
  b   l   a
  j   M   v

The data.frames have around 250 rows and 50 cols.

Upvotes: 15

Views: 39095

Answers (5)

Chathura Perera
Chathura Perera

Reputation: 128

I usually use bind_rows from dplyr

library(dplyr)


bind_rows(df1, df2  ,df3 ) 

caveats and options are here in the documentation

https://dplyr.tidyverse.org/reference/bind.html

happy merging :)

Upvotes: 1

LMc
LMc

Reputation: 18612

Here is an option using tidyverse packages:

library(purrr)
library(dplyr)

reduce(list(DF1, DF2, DF3), full_join)

By default this will merge by all common columns. For more control you can specify an anonymous formula function where .x represents the accumulated value and .y represents the "next" value. The comparable expression to the above would be:

reduce(list(DF1, DF2, DF3), ~ full_join(.x, .y, by = c("G1", "G2", "G3")))

Upvotes: 1

VladimirTech31
VladimirTech31

Reputation: 57

Just in case anyone wants to merge multiple data frames with the same column name but unequal row numbers, this article was helpful: https://medium.com/coinmonks/merging-multiple-dataframes-in-r-72629c4632a3

Basically, you use the do.call and rbind functions:

Merged <- do.call("rbind", list(df1, df2, df3, df4))

Upvotes: 3

Paul Sochacki
Paul Sochacki

Reputation: 490

After researching this very same question for a couple hours today, I came up with this simple but elegant solution using a combination of 'dplyr' pipes and the base R 'merge()' function.

MergedDF <- merge(DF1, DF2) %>%
              merge(DF3)

As you mention in your post, this assumes that the column names are the same and that there's the same number of rows in each data frame you are merging. This will also automatically eliminate any duplicate columns (i.e., identifiers) that were used in the merging process.

Upvotes: 12

rcs
rcs

Reputation: 68809

You can use the Reduce function to merge multiple data frames:

df_list <- list(DF1, DF2, DF3)
Reduce(function(x, y) merge(x, y, all=TRUE), df_list, accumulate=FALSE)

Or merge_recurse from the reshape package:

library(reshape)
data <- merge_recurse(df_list)

See also the R Wiki: Merge data frames

Upvotes: 16

Related Questions