ATMathew
ATMathew

Reputation: 12856

Select similar and unique values in a data frame

I have the following data.

df = data.frame(email_one=c("[email protected]","[email protected]","[email protected]",
        "[email protected]","[email protected]"), email_two=c("[email protected]",
        "[email protected]","[email protected]","[email protected]","[email protected]"))

I'm wondering if I can use R to select values that appear in both columns, unique values that appear in just the first column, and unique values that appear in just column two.

I was initially trying to figure this out in excel, but I'm assuming there is a more elegant solution in R, maybe even with the sqldf package. Preferably with a built-in function and not a user defined function full of various conditional statements (df$email_one == df$email_two)

Can anyone help point me in the right direction.

Upvotes: 1

Views: 329

Answers (1)

Josh O'Brien
Josh O'Brien

Reputation: 162321

You were right to suspect that there would be built-in functions for these operations. In this case, you want the functions intersect() and setdiff(), documented together with a few related functions on the ?intersect help page.

# Elements present in both columns
intersect(df[[1]], df[[2]])
[1] "[email protected]" "[email protected]" "[email protected]"

# Elements of column 1 that are not in column 2 
setdiff(df[[1]], df[[2]])
[1] "[email protected]"  "[email protected]"

# Elements of column _2_ that are not in column _1_
setdiff(df[[2]], df[[1]])
[1] "[email protected]" "[email protected]" 

Upvotes: 4

Related Questions