Sheila
Sheila

Reputation: 2597

Unique and non-unique lists of values in a data frame in R

Suppose I have two Data Frames:

Data frame 1 (let's call this Data1):

V1     V2     
1     "AB"    
3     "XY"
5     "DH"
8     "ST"
7     "RE"

code for Data1:

V1 <- c(1,3,5,8,7)
V2 <- c("AB","XY", "DH", "ST","RE")
Data1 <- data.frame(V1,V2)

Data frame 2 (lets call this Data2):

V1     V2     
1     "AB"    
2     "ZZ"
3     "XY"
5     "DH"
8     "ST" 

code for Data2:

V1 <- c(1,2,3,5,8)
V2 <- c("AB","ZZ","XY","DH","ST")
Data2 <- data.frame(V1,V2)

If you notice, Data2's second row (where V2's value is "ZZ") is not present in Data1 AND the last row in Data1 (where V2's value is "RE") is not present in Data2.

A) I would like to make a list of all V2 values that are NOT present in either of the data frames.
For this example that would be "ZZ" and "RE".

B) I would like to make a list of all V2 values that ARE present in both data frames.
For this example, the result would be "AB", "XY", "DH", "ST".

Upvotes: 1

Views: 1022

Answers (2)

Ricardo Saporta
Ricardo Saporta

Reputation: 55420

you are looking for ?setdiff and ?intersect

inters <- intersect(DF2$V2, DF1$V2)
[1] "AB" "XY" "DH" "ST"

setdf <- c(setdiff(DF2$V2, DF1$V2), setdiff(DF1$V2, DF2$V2))
[1] "ZZ" "RE"

Upvotes: 2

canary_in_the_data_mine
canary_in_the_data_mine

Reputation: 2393

You can use the %in% expression to find whether values of V2 exist in both dataframes. Use the not expression (!) to find those that do not exist in both dataframes, and then bind the results together from both of those.

> rbind(Data1[!Data1$V2 %in% Data2$V2,], Data2[!Data2$V2 %in% Data1$V2,])
  V1 V2
5  7 RE
2  2 ZZ
> unique(rbind(Data1[Data1$V2 %in% Data2$V2,], Data2[Data2$V2 %in% Data1$V2,]))
  V1 V2
1  1 AB
2  3 XY
3  5 DH
4  8 ST

On this last piece: if every V1,V2 combination will be the same, you can simply write

Data1[Data1$V2 %in% Data2$V2,]

and save yourself some lines of code.

Upvotes: 2

Related Questions