Filter rows based on two criteria

Question

My dataframe looks like this:

Key   Year    Type
A     2000    ok
A     2001    ok
A     2001    notok
A     2002    ok
A     2003    ok
B     2000    ok
B     2001    ok
B     2001    ok
B     2002    ok
B     2003    ok
C     2000    ok
C     2001    ok
C     2002    ok
C     2003    ok

I am looking for a code that gives me back all of the Letters in my column key if there are two observations in a certain year with one of them saying "notok" and the other "ok" in my column type. I do not want to have in my new dataframe the key b even though there are 2 observations in one year. It is because in my column Type the observations are both marked with ok.

So the answer should look like this:

Key   Year    Type
A     2000    ok
A     2001    ok
A     2001    notok
A     2002    ok
A     2003    ok

Is there a simple code for this?

akrun · Accepted Answer

If this also takes into account the 'Year' column, then we have to group by 'Key' and 'Year'

df1 %>%
   group_by(Key, Year) %>% 
   mutate(n = sum(c("ok", "notok") %in% Type)) %>% 
   group_by(Key) %>% 
   filter(any(n == 2)) %>%
   select(-n)
# A tibble: 5 x 3
# Groups:   Key [1]
#  Key    Year Type 
#    
#1 A      2000 ok   
#2 A      2001 ok   
#3 A      2001 notok
#4 A      2002 ok   
#5 A      2003 ok

Or using base R ave

i1 <- with(df1, ave(ave(Type, Key, Year, FUN = 
        function(x) length(unique(x)))==2, Key, FUN = any))
df1[i1,]
# Key Year  Type
#1   A 2000    ok
#2   A 2001    ok
#3   A 2001 notok
#4   A 2002    ok
#5   A 2003    ok

Or using split with table

subset(df1, Key %in% names(which(sapply(split(df1[-1], Key), 
     function(x) ncol(table(x))==2))))

Based on the expected output, after grouping by 'Key', filter those 'Key's having both "ok" and "notok" %in% the 'Type' column

df1 %>%
  group_by(Key) %>% 
  filter(all(c("ok", "notok") %in% Type))
# A tibble: 5 x 3
# Groups:   Key [1]
#  Key    Year Type 
#    
#1 A      2000 ok   
#2 A      2001 ok   
#3 A      2001 notok
#4 A      2002 ok   
#5 A      2003 ok

If there are only 'ok' and 'notok' in the 'Type', we can count the number of unique elements to filter

df1 %>% 
   group_by(Key) %>%
   filter(n_distinct(Type)==2)

data

df1 <- structure(list(Key = c("A", "A", "A", "A", "A", "B", "B", "B", 
"B", "B", "C", "C", "C", "C"), Year = c(2000L, 2001L, 2001L, 
2002L, 2003L, 2000L, 2001L, 2001L, 2002L, 2003L, 2000L, 2001L, 
2002L, 2003L), Type = c("ok", "ok", "notok", "ok", "ok", "ok", 
"ok", "ok", "ok", "ok", "ok", "ok", "ok", "ok")), class = "data.frame", row.names = c(NA, 
-14L))

Filter rows based on two criteria

Answers (2)

data

Related Questions