Select similar and unique values in a data frame

Question

I have the following data.

df = data.frame(email_one=c("one@gkn.com","two@wern.com","three@fu.cin",
        "four@huo.com","five@hoi.com"), email_two=c("ten@hoinse.com",
        "four@huo.com","two@wern.com","five@hoi.com","six@ihoio.com"))

I'm wondering if I can use R to select values that appear in both columns, unique values that appear in just the first column, and unique values that appear in just column two.

I was initially trying to figure this out in excel, but I'm assuming there is a more elegant solution in R, maybe even with the sqldf package. Preferably with a built-in function and not a user defined function full of various conditional statements (df$email_one == df$email_two)

Can anyone help point me in the right direction.

Josh O&#39;Brien · Accepted Answer

You were right to suspect that there would be built-in functions for these operations. In this case, you want the functions intersect() and setdiff(), documented together with a few related functions on the ?intersect help page.

# Elements present in both columns
intersect(df[[1]], df[[2]])
[1] "two@wern.com" "four@huo.com" "five@hoi.com"

# Elements of column 1 that are not in column 2 
setdiff(df[[1]], df[[2]])
[1] "one@gkn.com"  "three@fu.cin"

# Elements of column _2_ that are not in column _1_
setdiff(df[[2]], df[[1]])
[1] "ten@hoinse.com" "six@ihoio.com"

Select similar and unique values in a data frame

Answers (1)

Related Questions