tadeufontes
tadeufontes

Reputation: 477

How to make all values the same in a column, according to the single occurence of a value from another column?

So I have a data frame with species names and grades from A-E attributed to each, and sometimes there are occurences of the same species with different grades, but I want the following: if a species has even one occurence with grade X then all other occurences of that species must be grade X as well. This is my data frame:

     species        |    grade      | 
-----------------------------------
Tilapia guineensis  | B |
Tilapia guineensis  | E |
Tilapia zillii      | A |
Fundulus rubrifrons | A |
Eutrigla gurnardus  | D |
Sprattus sprattus   | A |
Gadus morhua        | E |
Gadus morhua        | B |
Tilapia zillii      | C |
Gadus morhua        | B | 
Eutrigla gurnardus  | C |

So far I tried the following for grade E as an example:

 df<-df%>% left_join(df%>% 
                                   group_by(species) %>% 
                                   summarize(sum_e = sum(grade=='E')),by='species') %>%
    mutate(grade = ifelse(sum_e>0,"E",grade))

But I get the error:

Error: `by` can't contain join column `species` which is missing from RHS

The output I want is basically this:

     species        |    grade      | 
-----------------------------------
Tilapia guineensis  | E |
Tilapia guineensis  | E |
Tilapia zillii      | C |
Fundulus rubrifrons | A |
Eutrigla gurnardus  | D |
Sprattus sprattus   | A |
Gadus morhua        | E |
Gadus morhua        | E |
Tilapia zillii      | C |
Gadus morhua        | B | 
Eutrigla gurnardus  | D |

Upvotes: 0

Views: 34

Answers (1)

Jonny Phelps
Jonny Phelps

Reputation: 2727

Here's how I would approach this using data.table package. I think if changing to dplyr the stages would be similar, just written differently

# solution using data.table package
library(data.table)

# fake data, replace with yours
df <- data.frame(species=c("a", "a", "b", "b"),
                 grade=c("A", "E", "B", "C"))

# select your grade
dominant_grade <- "E"
# convert to data.table
dt <- as.data.table(df)
# search over species, add a column that checks if any of the grades is equal
# to the dominant one
dt[, contains_dominant := any(grade == dominant_grade), by=species]
# For cases where the dominant one is present, set all the grades to the dominant
# one
dt[contains_dominant == TRUE, grade := dominant_grade]

# convert back to data frame and trim for output
out <- setDF(dt[, .(species, grade)])
out

Upvotes: 1

Related Questions