Reputation: 65
I have two data frame with same row size and different column number, the name of the columns is also different, however the content may be similar in some of them.
i.e. df1:
df1<- data.frame("a"=c("0","1","0","1","0","0","0"),
"b"=c("1","1","1","1","1","0","0"),
"c"=c("1","1","0","0","1","0","0"),
"d"=c("1","1","1","1","1","1","1"))
df2:
df2<- data.frame("e"=c("1","1","0","1","0","0","0"),
"f"=c("1","1","1","1","1","0","0"),
"g"=c("0","0","0","0","1","0","0"),
"h"=c("0","0","0","0","1","1","1"))
If you see, the column "b" of df1 and "f" of df2 are equal. Therefore, the result I want is a new dataframe looking like this:
df3 <- data.frame("a"=c("0","1","0","1","0","0","0"),
"c"=c("1","1","0","0","1","0","0"),
"d"=c("1","1","1","1","1","1","1"),
"e"=c("1","1","0","1","0","0","0"),
"g"=c("0","0","0","0","1","0","0"),
"h"=c("0","0","0","0","1","1","1"))
NOTE: column "b" and "f" (that were similar) are not in the new df3. I have looked in the web but I did not find an example for this. I think the major complexity is that the merge is by content and not by column name.
Upvotes: 1
Views: 1098
Reputation: 887148
We can use outer
from base R
mat <- outer(df1, df2, FUN = Vectorize(function(x, y) all(x == y)))
mat
# e f g h
#a FALSE FALSE FALSE FALSE
#b FALSE TRUE FALSE FALSE
#c FALSE FALSE FALSE FALSE
#d FALSE FALSE FALSE FALSE
Now, we can get the row/column names
m2 <- as.matrix(subset(as.data.frame.table(mat), Freq, select = -Freq))
Now, we use the 'm2' to get remove the column names from 'df1', 'df2' and cbind
cbind(df1[setdiff(names(df1), m2[,1])], df2[setdiff(names(df2), m2[,2])])
# a c d e g h
#1 0 1 1 1 0 0
#2 1 1 1 1 0 0
#3 0 0 1 0 0 0
#4 1 0 1 1 0 0
#5 0 1 1 0 1 1
#6 0 0 1 0 0 1
#7 0 0 1 0 0 1
Upvotes: 1
Reputation: 5747
Here is a more tidyverse
solution.
library(dplyr)
library(tidyr)
# based on Ronak's sapply approach
matches <- as.data.frame(sapply(df1, function(x) sapply(df2, function(y) identical(x, y)))) %>%
rownames_to_column(var = "df2") %>%
pivot_longer(-df2, names_to = "df1") %>% # pivot longer
filter(value) # keep only the matches
# programmatically build list of names to remove
vars_remove <- c(matches$df1, matches$df2) # will remove var names that are matches
df1 %>% bind_cols(df2) %>%
select(-any_of(vars_remove))
a c d e g h
1 0 1 1 1 0 0
2 1 1 1 1 0 0
3 0 0 1 0 0 0
4 1 0 1 1 0 0
5 0 1 1 0 1 1
6 0 0 1 0 0 1
7 0 0 1 0 0 1
Upvotes: 1
Reputation: 135
This would do the job:
df3 <- cbind(df1,df2)
df3 <- t(t(df3)[!(duplicated(t(df3)) | duplicated(t(df3), fromLast = TRUE)),])
df3
# a c d e g h
#1 0 1 1 1 0 0
#2 1 1 1 1 0 0
#3 0 0 1 0 0 0
#4 1 0 1 1 0 0
#5 0 1 1 0 1 1
#6 0 0 1 0 0 1
#7 0 0 1 0 0 1
this will give you a matrix
, you can save the result as a df
if so desired
Upvotes: 1
Reputation: 388982
We can use sapply
to check for the columns that perfectly match.
mat <- sapply(df1, function(x) sapply(df2, function(y) all(x == y)))
mat
# a b c d
#e FALSE FALSE FALSE FALSE
#f FALSE TRUE FALSE FALSE
#g FALSE FALSE FALSE FALSE
#h FALSE FALSE FALSE FALSE
Here we can see column b
from df1
and column f
from df2
should be removed. We can do this by :
m2 <- which(mat, arr.ind = TRUE)
cbind(df1[-m2[, 2]], df2[-m2[, 1]])
# a c d e g h
#1 0 1 1 1 0 0
#2 1 1 1 1 0 0
#3 0 0 1 0 0 0
#4 1 0 1 1 0 0
#5 0 1 1 0 1 1
#6 0 0 1 0 0 1
#7 0 0 1 0 0 1
Upvotes: 1