Reputation: 4055
I have a data set with a lot of values. The majority of x matches a value in y uniquely. However some of x match multiple ys. Is there an easy way to find which values of y map to multiple xs?
mydata <- data.frame(x = c(letters,letters), y=c(LETTERS,LETTERS))
mydata$y[c(3,5)] <- "A"
mydata$y[c(10,15)] <- "Z"
mydata %>% foo
[1] "A" "Z"
I apologize if I am missing some obvious command here.
Upvotes: 1
Views: 1798
Reputation: 887118
If we need the corresponding unique
values in 'x'
library(data.table)
setDT(mydata)[,if(.N >2) toString(unique(.SD[[1L]])) , y]
# y V1
#1: A a, c, e
#2: Z j, o, z
Upvotes: 1
Reputation: 3253
use data.table
library(data.table)
setDT(mydata)
mydata[,list(n=length(unique(x))), by=y][n>2,]
# y n
# 1: A 3
# 2: Z 3
Upvotes: 1
Reputation: 10483
Using dplyr, you can do:
library(dplyr)
mydata <- data.frame(x = letters, y=LETTERS, stringsAsFactors = FALSE)
mydata$y[c(3,5)] <- "A"
mydata$y[c(10,15)] <- "Z"
mydata %>% group_by(y) %>% filter(n() > 1)
If you want to extract just the y values, you can store that to a data frame like this and find unique y values:
df <- mydata %>% group_by(y) %>% filter(n() > 1)
unique(df$y)
Another alternative format to get the same output into is as follows. This returns a single column data frame instead of a vector as above.
mydata %>% group_by(y) %>% filter(n() > 1) %>% select(y) %>% distinct()
Upvotes: 1