Reputation: 57
I'm trying to figure out a way to find specific values based on each factor within R. In other words, how can I keep all rows that suffice a certain condition for each factor, even if that specific row fails a condition but it's same factor passes the condition on another row?
So I have something like this:
gender values fruit
1 M 20 apple
2 M 22 pear
3 F 24 mango
4 F 19 mango
5 F 9 mango
6 F 17 apple
7 M 18 banana
8 M 22 banana
9 M 12 banana
10 M 14 mango
11 F 7 apple
12 F 8 apple
I want every fruit and has at least one F gender (even if that fruit has some M's). It's also possible to have multiple genders, such as neutral (not shown). So my ideal output out be this:
gender values fruit
1 M 20 apple
3 F 24 mango
4 F 19 mango
5 F 9 mango
6 F 17 apple
10 M 14 mango
11 F 7 apple
12 F 8 apple
Notice that the banana and pear are missing, that's because those fruits ONLY have M's and no F's. Also, rows 1 and 10 are still there even though those are M's, because there are other apples and mangos that have F's, it still applies. Please let me know if this is possible. Thank you!
Below is my code for replicating this data:
gender <- c("M","M","F","F","F","F","M","M","M","M","F","F")
values <- c(20,22,24,19,9,17,18,22,12,14,7,8)
fruit <- c("apple","pear","mango","mango","mango","apple","banana","banana","banana","mango","apple","apple")
df <- data.frame(gender, values, fruit)
Here's what I've tried so far:
df[duplicated(df[,c("fruit","gender")]),]
ave(df$gender, df$fruit, FUN=function(x) ifelse(x=='F','yes','no'))
Also, third party libraries are welcomed but I prefer to stay within R (packages stats and plyr are fine as I have those on my system).
Upvotes: 1
Views: 93
Reputation: 7190
The base r, the data.table
and here I provide the dplyr
solution even though some outputs are different (at least in order of the results).
library(dplyr)
df %>% group_by(fruit) %>% filter(any(gender == "F"))
Source: local data frame [8 x 3]
Groups: fruit
gender values fruit
1 M 20 apple
2 F 24 mango
3 F 19 mango
4 F 9 mango
5 F 17 apple
6 M 14 mango
7 F 7 apple
8 F 8 apple
Upvotes: 1
Reputation: 92302
Possible data.table
approach
library(data.table)
setDT(df)[, if(any(gender == "F")) .SD, by = fruit]
# fruit gender values
# 1: apple M 20
# 2: apple F 17
# 3: apple F 7
# 4: apple F 8
# 5: mango F 24
# 6: mango F 19
# 7: mango F 9
# 8: mango M 14
I like the other approach, so here's a data.table
equivalent using binary join
setkey(setDT(df), fruit)[.(unique(df[gender == "F", fruit], by = "fruit"))]
# gender values fruit
# 1: F 17 apple
# 2: F 7 apple
# 3: F 8 apple
# 4: M 20 apple
# 5: F 24 mango
# 6: F 19 mango
# 7: F 9 mango
# 8: M 14 mango
Upvotes: 3
Reputation: 28461
df[df$fruit %in% unique(df[df$gender =='F', ]$fruit),]
# gender values fruit
#1 M 20 apple
#3 F 24 mango
#4 F 19 mango
#5 F 9 mango
#6 F 17 apple
#10 M 14 mango
#11 F 7 apple
#12 F 8 apple
Upvotes: 3