Reputation: 477
So I have a data frame with species names and grades from A-E attributed to each, and sometimes there are occurences of the same species with different grades, but I want the following: if a species has even one occurence with grade X then all other occurences of that species must be grade X as well. This is my data frame:
species | grade |
-----------------------------------
Tilapia guineensis | B |
Tilapia guineensis | E |
Tilapia zillii | A |
Fundulus rubrifrons | A |
Eutrigla gurnardus | D |
Sprattus sprattus | A |
Gadus morhua | E |
Gadus morhua | B |
Tilapia zillii | C |
Gadus morhua | B |
Eutrigla gurnardus | C |
So far I tried the following for grade E as an example:
df<-df%>% left_join(df%>%
group_by(species) %>%
summarize(sum_e = sum(grade=='E')),by='species') %>%
mutate(grade = ifelse(sum_e>0,"E",grade))
But I get the error:
Error: `by` can't contain join column `species` which is missing from RHS
The output I want is basically this:
species | grade |
-----------------------------------
Tilapia guineensis | E |
Tilapia guineensis | E |
Tilapia zillii | C |
Fundulus rubrifrons | A |
Eutrigla gurnardus | D |
Sprattus sprattus | A |
Gadus morhua | E |
Gadus morhua | E |
Tilapia zillii | C |
Gadus morhua | B |
Eutrigla gurnardus | D |
Upvotes: 0
Views: 34
Reputation: 2727
Here's how I would approach this using data.table
package. I think if changing to dplyr
the stages would be similar, just written differently
# solution using data.table package
library(data.table)
# fake data, replace with yours
df <- data.frame(species=c("a", "a", "b", "b"),
grade=c("A", "E", "B", "C"))
# select your grade
dominant_grade <- "E"
# convert to data.table
dt <- as.data.table(df)
# search over species, add a column that checks if any of the grades is equal
# to the dominant one
dt[, contains_dominant := any(grade == dominant_grade), by=species]
# For cases where the dominant one is present, set all the grades to the dominant
# one
dt[contains_dominant == TRUE, grade := dominant_grade]
# convert back to data frame and trim for output
out <- setDF(dt[, .(species, grade)])
out
Upvotes: 1