marb_021
marb_021

Reputation: 65

How to merge two dataframes and keep only different columns (content)?

I have two data frame with same row size and different column number, the name of the columns is also different, however the content may be similar in some of them.

i.e. df1:

df1<- data.frame("a"=c("0","1","0","1","0","0","0"),
                "b"=c("1","1","1","1","1","0","0"),
                "c"=c("1","1","0","0","1","0","0"),
                "d"=c("1","1","1","1","1","1","1"))

df2:

df2<- data.frame("e"=c("1","1","0","1","0","0","0"),
                "f"=c("1","1","1","1","1","0","0"),
                "g"=c("0","0","0","0","1","0","0"),
                "h"=c("0","0","0","0","1","1","1"))

If you see, the column "b" of df1 and "f" of df2 are equal. Therefore, the result I want is a new dataframe looking like this:

df3 <- data.frame("a"=c("0","1","0","1","0","0","0"),
                  "c"=c("1","1","0","0","1","0","0"),
                  "d"=c("1","1","1","1","1","1","1"),
                  "e"=c("1","1","0","1","0","0","0"),
                  "g"=c("0","0","0","0","1","0","0"),
                  "h"=c("0","0","0","0","1","1","1"))

NOTE: column "b" and "f" (that were similar) are not in the new df3. I have looked in the web but I did not find an example for this. I think the major complexity is that the merge is by content and not by column name.

Upvotes: 1

Views: 1098

Answers (4)

akrun
akrun

Reputation: 887148

We can use outer from base R

mat <- outer(df1, df2, FUN = Vectorize(function(x, y) all(x == y)))
mat
#      e     f     g     h
#a FALSE FALSE FALSE FALSE
#b FALSE  TRUE FALSE FALSE
#c FALSE FALSE FALSE FALSE
#d FALSE FALSE FALSE FALSE

Now, we can get the row/column names

m2 <- as.matrix(subset(as.data.frame.table(mat), Freq, select = -Freq))

Now, we use the 'm2' to get remove the column names from 'df1', 'df2' and cbind

cbind(df1[setdiff(names(df1), m2[,1])], df2[setdiff(names(df2), m2[,2])])
#  a c d e g h
#1 0 1 1 1 0 0
#2 1 1 1 1 0 0
#3 0 0 1 0 0 0
#4 1 0 1 1 0 0
#5 0 1 1 0 1 1
#6 0 0 1 0 0 1
#7 0 0 1 0 0 1

Upvotes: 1

Ben Norris
Ben Norris

Reputation: 5747

Here is a more tidyverse solution.

library(dplyr)
library(tidyr)
# based on Ronak's sapply approach
matches <- as.data.frame(sapply(df1, function(x) sapply(df2, function(y) identical(x, y)))) %>%
  rownames_to_column(var = "df2") %>%
  pivot_longer(-df2, names_to = "df1") %>% # pivot longer
  filter(value) # keep only the matches

# programmatically build list of names to remove
vars_remove <- c(matches$df1, matches$df2) # will remove var names that are matches
df1 %>% bind_cols(df2) %>%
  select(-any_of(vars_remove))

  a c d e g h
1 0 1 1 1 0 0
2 1 1 1 1 0 0
3 0 0 1 0 0 0
4 1 0 1 1 0 0
5 0 1 1 0 1 1
6 0 0 1 0 0 1
7 0 0 1 0 0 1

Upvotes: 1

Ivn Ant
Ivn Ant

Reputation: 135

This would do the job:

df3 <- cbind(df1,df2)
df3 <- t(t(df3)[!(duplicated(t(df3)) | duplicated(t(df3), fromLast = TRUE)),])
df3

#  a c d e g h
#1 0 1 1 1 0 0
#2 1 1 1 1 0 0
#3 0 0 1 0 0 0
#4 1 0 1 1 0 0
#5 0 1 1 0 1 1
#6 0 0 1 0 0 1
#7 0 0 1 0 0 1

this will give you a matrix, you can save the result as a df if so desired

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 388982

We can use sapply to check for the columns that perfectly match.

mat <- sapply(df1, function(x) sapply(df2, function(y) all(x == y)))
mat

#      a     b     c     d
#e FALSE FALSE FALSE FALSE
#f FALSE  TRUE FALSE FALSE
#g FALSE FALSE FALSE FALSE
#h FALSE FALSE FALSE FALSE

Here we can see column b from df1 and column f from df2 should be removed. We can do this by :

m2 <- which(mat, arr.ind = TRUE)
cbind(df1[-m2[, 2]], df2[-m2[, 1]])

#  a c d e g h
#1 0 1 1 1 0 0
#2 1 1 1 1 0 0
#3 0 0 1 0 0 0
#4 1 0 1 1 0 0
#5 0 1 1 0 1 1
#6 0 0 1 0 0 1
#7 0 0 1 0 0 1

Upvotes: 1

Related Questions