Reputation: 12856
I have the following data.
df = data.frame(email_one=c("[email protected]","[email protected]","[email protected]",
"[email protected]","[email protected]"), email_two=c("[email protected]",
"[email protected]","[email protected]","[email protected]","[email protected]"))
I'm wondering if I can use R to select values that appear in both columns, unique values that appear in just the first column, and unique values that appear in just column two.
I was initially trying to figure this out in excel, but I'm assuming there is a more elegant solution in R, maybe even with the sqldf package. Preferably with a built-in function and not a user defined function full of various conditional statements (df$email_one == df$email_two)
Can anyone help point me in the right direction.
Upvotes: 1
Views: 329
Reputation: 162321
You were right to suspect that there would be built-in functions for these operations. In this case, you want the functions intersect()
and setdiff()
, documented together with a few related functions on the ?intersect
help page.
# Elements present in both columns
intersect(df[[1]], df[[2]])
[1] "[email protected]" "[email protected]" "[email protected]"
# Elements of column 1 that are not in column 2
setdiff(df[[1]], df[[2]])
[1] "[email protected]" "[email protected]"
# Elements of column _2_ that are not in column _1_
setdiff(df[[2]], df[[1]])
[1] "[email protected]" "[email protected]"
Upvotes: 4