Nathan123
Nathan123

Reputation: 763

how to obtain unique key for those with non unique value's in R

I have the following data frame

fileN<-c("510-1","510-1","510-2","510-2","510-3","510-3","510-3")
disp<-c("account","fail","fail","fail","account","account","fail")

df<-data.frame(fileN,disp)
df

  fileN    disp
1 510-1 account
2 510-1    fail
3 510-2    fail
4 510-2    fail
5 510-3 account
6 510-3 account
7 510-3    fail

I want to obtain unique fileN's where the disp has both fail and account

I know I can do the following to get only those with a fail

df %>%
  group_by(fileN) %>%
  filter( all(disp == 'fail')) %>%
  distinct

but how do I get the fileN of those that have both fail and account so that the desired result is

510-1
510-3

Upvotes: 1

Views: 57

Answers (2)

akrun
akrun

Reputation: 887711

An option would be after grouping by 'fileN', filter where the all the elements in the desired vector is present %in% 'disp' column, then ungroup and get the distinct elements from 'fileN'

library(dplyr)
df %>% 
  group_by(fileN) %>%
  filter(all(c('fail', 'account') %in% disp)) %>%
  ungroup %>%
  distinct(fileN)
# A tibble: 2 x 1
#  fileN
#  <fct>
#1 510-1
#2 510-3

If these are the only values possible, another option is

distinct(df) %>% 
    group_by(fileN) %>% 
    filter(n() == 2) %>% 
    distinct(fileN)

Upvotes: 4

d.b
d.b

Reputation: 32558

Couple of base R options

v = tapply(df$disp, df$fileN, function(x){
    all(c("account", "fail") %in% x)
})
v[v]
#510-1 510-3 
# TRUE  TRUE 

#OR

with(aggregate(disp ~ fileN, df, function(x){
    all(c("account", "fail") %in% x)
}), fileN[disp])
#[1] 510-1 510-3
#Levels: 510-1 510-2 510-3

Upvotes: 2

Related Questions