Evelyne De Man
Evelyne De Man

Reputation: 23

How to merge multiple data frames that have partly the same column names based on a key variable?

I have multiple data frames (around 20, here only shown 3 simplified ones) with some overlapping column names. The key variable is "id" and I would like to merge the data frames based on this key variable. There should not be created extra columns. On the other hand I would like to avoid duplicate rows, such that rows with the same key are combined to fill in as much as empty fields as possible while having only one row per key.

I already tried "rbind.fill" but while it fills in correctly the columns, it creates duplicate rows. On the other hand if I try "merge" it gives me back an empty data frame.

df1<- cbind.data.frame(id=c(1,2,3,4), price= c(15,16,20,25), color= c("Black", NA, "White", "Green"), weight= c(5,6,10,12))
df2<- cbind.data.frame(id=c(3,4,5,6), price=c(NA, NA, 23,30), weight=c(10,12,NA, NA), battery= c("low", "high", NA, NA))
df3<- cbind.data.frame(id=c(5,6,7,8), weight= c(NA, 15,17,NA), battery= c("low", "high","high", NA), surface= c(100,115,NA, NA))


df_list <- list(df1,df2,df3)
df5<-Reduce(function(d1, d2) merge(d1, d2, by = "id"),df_list)

library(plyr)
df6 <- rbind.fill(df1,df2,df3)

I Expect the output to be like this data frame:

df4 <- cbind.data.frame(id=c(1,2,3,4,5,6,7,8), price= c(15,16,20,25,23,30,17,NA),color= c("Black", NA, "White", "Green", NA, NA, NA, NA),weight= c(5,6,10,12,NA, 15,NA,NA), battery= c(NA, NA,"low", "high","low", "high","high", NA), surface= c(NA, NA, NA, NA,100,115,NA, NA))

Upvotes: 2

Views: 1033

Answers (1)

Aron Strandberg
Aron Strandberg

Reputation: 3080

With dplyr:

df_list <- list(df1,df2,df3)

library(dplyr)

bind_rows(df_list) %>%
  group_by(id) %>%
  summarise_all(~first(na.omit(.)))

I'm assuming that non-missing fields will match across the data frames, and therefore just pick the first one observed.

Result

# A tibble: 8 x 6
     id price color weight battery surface
  <dbl> <dbl> <fct>  <dbl> <fct>     <dbl>
1     1    15 Black      5 NA           NA
2     2    16 NA         6 NA           NA
3     3    20 White     10 low          NA
4     4    25 Green     12 high         NA
5     5    23 NA        NA low         100
6     6    30 NA        15 high        115
7     7    NA NA        17 high         NA
8     8    NA NA        NA NA           NA

Upvotes: 3

Related Questions