Chang Park
Chang Park

Reputation: 57

R - Find rows based on group factors

I'm trying to figure out a way to find specific values based on each factor within R. In other words, how can I keep all rows that suffice a certain condition for each factor, even if that specific row fails a condition but it's same factor passes the condition on another row?

So I have something like this:

   gender values  fruit
1       M     20  apple
2       M     22   pear
3       F     24  mango
4       F     19  mango
5       F      9  mango
6       F     17  apple
7       M     18 banana
8       M     22 banana
9       M     12 banana
10      M     14  mango
11      F      7  apple
12      F      8  apple

I want every fruit and has at least one F gender (even if that fruit has some M's). It's also possible to have multiple genders, such as neutral (not shown). So my ideal output out be this:

   gender values  fruit
1       M     20  apple
3       F     24  mango
4       F     19  mango
5       F      9  mango
6       F     17  apple
10      M     14  mango
11      F      7  apple
12      F      8  apple

Notice that the banana and pear are missing, that's because those fruits ONLY have M's and no F's. Also, rows 1 and 10 are still there even though those are M's, because there are other apples and mangos that have F's, it still applies. Please let me know if this is possible. Thank you!

Below is my code for replicating this data:

gender <- c("M","M","F","F","F","F","M","M","M","M","F","F")
values <- c(20,22,24,19,9,17,18,22,12,14,7,8)
fruit <- c("apple","pear","mango","mango","mango","apple","banana","banana","banana","mango","apple","apple")
df <- data.frame(gender, values, fruit)

Here's what I've tried so far:

df[duplicated(df[,c("fruit","gender")]),]
ave(df$gender, df$fruit, FUN=function(x) ifelse(x=='F','yes','no'))

Also, third party libraries are welcomed but I prefer to stay within R (packages stats and plyr are fine as I have those on my system).

Upvotes: 1

Views: 93

Answers (3)

SabDeM
SabDeM

Reputation: 7190

The base r, the data.table and here I provide the dplyr solution even though some outputs are different (at least in order of the results).

library(dplyr)
df %>% group_by(fruit) %>% filter(any(gender == "F"))
Source: local data frame [8 x 3]
Groups: fruit

  gender values fruit
1      M     20 apple
2      F     24 mango
3      F     19 mango
4      F      9 mango
5      F     17 apple
6      M     14 mango
7      F      7 apple
8      F      8 apple

Upvotes: 1

David Arenburg
David Arenburg

Reputation: 92302

Possible data.table approach

library(data.table)
setDT(df)[, if(any(gender == "F")) .SD, by = fruit]
#    fruit gender values
# 1: apple      M     20
# 2: apple      F     17
# 3: apple      F      7
# 4: apple      F      8
# 5: mango      F     24
# 6: mango      F     19
# 7: mango      F      9
# 8: mango      M     14

I like the other approach, so here's a data.table equivalent using binary join

setkey(setDT(df), fruit)[.(unique(df[gender == "F", fruit], by = "fruit"))]
#    gender values fruit
# 1:      F     17 apple
# 2:      F      7 apple
# 3:      F      8 apple
# 4:      M     20 apple
# 5:      F     24 mango
# 6:      F     19 mango
# 7:      F      9 mango
# 8:      M     14 mango

Upvotes: 3

Pierre L
Pierre L

Reputation: 28461

df[df$fruit %in% unique(df[df$gender =='F', ]$fruit),]
#   gender values fruit
#1       M     20 apple
#3       F     24 mango
#4       F     19 mango
#5       F      9 mango
#6       F     17 apple
#10      M     14 mango
#11      F      7 apple
#12      F      8 apple

Upvotes: 3

Related Questions