Using dplyr to select groups depending in ratio of item present into a list

Question

Hello I have a df such as

Groups COL1
G1 horse
G1 donkey
G1 unknown
G1 snake 
G1 horse 
G2 dog
G2 dog
G2 unknown
G2 unknown
G3 donkey
G3 dog
G4 Mule
G4 dog
G4 cat 
G4 cat
G5 mule
G5 donkey
G5 mule

and a list

list_not_accepted=c("horse","donkey","mule")

so basically the idea would be to select only groups where the number of COL1 element present in list_not_accepted / number of all COL1 value <0.6.

so here :

G1 3/5 = 0.6 so G1 does not pass
G2 0/5 = 0 so G2 passes
G3 1/2= 0.5 so G3 passes
G4 1/4 = 0.25 so G4 passes
G5 3/3 =1 so G5 does not passe

at the end we should get a df such as :

Groups COL1
G2 dog
G2 dog
G2 unknown
G2 unknown
G3 donkey
G3 dog
G4 Mule
G4 dog
G4 cat 
G4 cat

here are the data

> dput(tabl)
structure(list(Groups = c("G1", "G1", "G1", "G1", "G1", "G2", 
"G2", "G2", "G2", "G3", "G3", "G4", "G4", "G4", "G4", "G5", "G5", 
"G5"), COL1 = c("horse", "donkey", "unknown", "snake", "horse", 
"dog", "dog", "unknown", "unknown", "donkey", "dog", "Mule", 
"dog", "cat", "cat", "mule", "donkey", "mule")), row.names = c(NA, 
-18L), class = "data.frame")

Does someone have an idea please ? thank you very much !

Darren Tsai · Accepted Answer

A dplyr solution with filter():

library(dplyr)

df %>%
  group_by(Groups) %>%
  filter(sum(tolower(COL1) %in% list_not_accepted) / n() < 0.6)

# A tibble: 10 x 2
# Groups:   Groups [3]
#    Groups COL1   
#        
#  1 G2     dog    
#  2 G2     dog    
#  3 G2     unknown
#  4 G2     unknown
#  5 G3     donkey 
#  6 G3     dog    
#  7 G4     Mule   
#  8 G4     dog    
#  9 G4     cat    
# 10 G4     cat

The first element in G4 is "Mule". In your description it should match "mule" of list_not_accepted, so I turn all COL1 to lower cases before matching.

Using dplyr to select groups depending in ratio of item present into a list

Answers (2)

Related Questions