Francis Smart
Francis Smart

Reputation: 4055

R: Check if all values of one column match uniquely all values of another column

I have a data set with a lot of values. The majority of x matches a value in y uniquely. However some of x match multiple ys. Is there an easy way to find which values of y map to multiple xs?

mydata <- data.frame(x = c(letters,letters), y=c(LETTERS,LETTERS))
mydata$y[c(3,5)] <- "A"
mydata$y[c(10,15)] <- "Z"
mydata %>% foo
[1] "A" "Z"

I apologize if I am missing some obvious command here.

Upvotes: 1

Views: 1798

Answers (3)

akrun
akrun

Reputation: 887118

If we need the corresponding unique values in 'x'

library(data.table)
setDT(mydata)[,if(.N >2) toString(unique(.SD[[1L]])) , y]
#    y      V1
#1: A a, c, e
#2: Z j, o, z

Upvotes: 1

nsheff
nsheff

Reputation: 3253

use data.table

library(data.table)
setDT(mydata)
mydata[,list(n=length(unique(x))), by=y][n>2,]
#       y n
#    1: A 3
#    2: Z 3

Upvotes: 1

Gopala
Gopala

Reputation: 10483

Using dplyr, you can do:

library(dplyr)
mydata <- data.frame(x = letters, y=LETTERS, stringsAsFactors = FALSE)
mydata$y[c(3,5)] <- "A"
mydata$y[c(10,15)] <- "Z"
mydata %>% group_by(y) %>% filter(n() > 1)

If you want to extract just the y values, you can store that to a data frame like this and find unique y values:

df <- mydata %>% group_by(y) %>% filter(n() > 1)
unique(df$y)

Another alternative format to get the same output into is as follows. This returns a single column data frame instead of a vector as above.

mydata %>% group_by(y) %>% filter(n() > 1) %>% select(y) %>% distinct()

Upvotes: 1

Related Questions