rajeswa
rajeswa

Reputation: 47

Need to find and extract matching 'Names' from 2 different Name columns of 2 different Data Frames in R

I am new to R & need a bit of guidance here, my problem is like this: I have 2 dfs on both dfs I have performed series of operations and I need to perform this operation in the end

df1 & df2

df1 <- data.frame(name = c("A","B","C","D","E","F","F","G","s","x")) 
#(1)

df1$newname <-  c("A","V","C","D","c","v","x") #(name extracted from other column) (2)

df2 <- data.frame(Id_name = c("A","B","C","s","s", "x","G", "g"))
#(3)

Step1 = I need to match 2 with 3 first and extract common names, let's name it 4

Step2 = find names in 4 that have duplicate value = 1

Step3 = delete those values from 1 and 3

I tried using anti_join and semi_join but I guess that works for numeric values only, Is there any specific library available for this and how to solve this

Upvotes: 0

Views: 81

Answers (1)

Rui Barradas
Rui Barradas

Reputation: 76402

The strategy followed below relies on intersect/extraction:

  1. Get the common names with intersect.
  2. Remove the df1$name that can be found in common.
  3. Do as point 2, this time with df2$Id_name.

It is fully vectorized, no need for joins.
Note also argument drop = FALSE. The examples posted in the question have only one column, and with the default drop = TRUE the results would loose the dim attribute, becoming vectors.

common <- intersect(newname, df2$Id_name)
df1 <- df1[!df1$name %in% common, , drop = FALSE]
df2 <- df2[!df2$Id_name %in% common, , drop = FALSE]

Upvotes: 1

Related Questions