SUMIT
SUMIT

Reputation: 563

How do I compare two columns and print non matching values in each dataframe?

I would like to compare df1 and df2 and find the entry which are present df1 but not df2 and present in df2 but not in df1 in r.

Input:

df1 <- data.frame(ID1 = c("d", "p", "n", "m", "c"))
df2 <- data.frame(ID2 = c("c", "b", "a", "d", "s", "p"))

Output:

nonMatch_Uniquedf1 <- data.frame(ID1 = c("n", "m"))
nonMatch_Uniquedf2 <- data.frame(ID2 = c("b", "a", "s"))

Please note that both columns of df1 and df2 have different row numbers.

Thank you for your help.

Upvotes: 1

Views: 8433

Answers (2)

MonJeanJean
MonJeanJean

Reputation: 2906

With dplyr:

require(dplyr)

df1 %>% 
  filter(!df1$ID1 %in% df2$ID2) #For df1 values not in df2
df2 %>% 
  filter(!df2$ID2 %in% df1$ID1) #For df2 values not in df1

Edit: with the expected output:

nonMatch_Uniquedf1 <- df1 %>% 
      filter(!df1$ID1 %in% df2$ID2) #For df1 values not in df2
nonMatch_Uniquedf2 <- df2 %>% 
      filter(!df2$ID2 %in% df1$ID1) #For df2 values not in df1

Upvotes: 3

Here's another way of reaching the desired output using the anti_join function.

library(dplyr)

df1 %>%
  anti_join(df2, 
            # Define equivalence in column names in both df
            by = c("ID2" = "ID1"))
df2 %>%
  anti_join(df1, 
            # Define equivalence in column names in both df
            by = c("ID1" = "ID2"))

Upvotes: 4

Related Questions