Anthony
Anthony

Reputation: 65

Conditional mean of one column based on words contained in another column, for multiple columns

I am fairly new to R, and am having some difficulty with what seems like should be a pretty simple procedure. I have a data frame called "Bottom" containing columns: "Species", "Category", and "Y9:Y15" (signifying years 2009-2015. The "Species" column contains fish names, the "Category" contains the letter "B" all the way down signifying bottom fish (this data frame was taken out of a larger one with many different categories of fish) and "Y9:Y15" contains the prices of the fish species in the first column:

         Species  Category   Y9  Y10  Y11  Y12  Y13  Y14  Y15
       Amberjack         B 2.65   NA   NA   NA 3.00   NA 3.31
   Ambon emperor         B 2.62 2.63   NA   NA 3.75 3.06 3.00
    Bigeye bream         B 2.62 2.21 2.86   NA 3.09 3.10 3.02
     Bigeye scad         B 3.33   NA 2.81 2.51 2.62 3.00 2.77
 Bigeye trevally         B 2.69 2.75   NA   NA 3.73 3.22 3.00
      Black jack         B 2.66 2.52 2.55 3.00 3.75 3.26 3.42

I am trying to calculate 3 averages based on the following three conditions:

1) the average of all fish species with "grouper" in the name 2) the average of all fish species with "snapper" in the name 3) the average of all other fish species with neither of the above conditions.

I've found that I can get a vector of true or false for my conditions with grepl : grepl("grouper",Bottom$Species)], but I haven't figured out how to add this to a function telling R to calculate the average based on the "TRUE" values of the vector.

Any suggestions for this would be greatly appreciated.

Thank you!

Upvotes: 1

Views: 973

Answers (1)

JPHwang
JPHwang

Reputation: 136

If you don't require the averages being appended to the original dataframe, here's a sample using a modified version of your data

a <- c("Amber jack", "Ambon emperor", "Bigeye bream", "Black jack")
b <- c(6, 4, 4, 1)

df <- data.frame(a, b)

df shows

              a b
1    Amber jack 6
2 Ambon emperor 4
3  Bigeye bream 4
4    Black jack 1

next use filter from dplyr and your grepl expression to capture the fish names

df %>% 
  filter(grepl("jack", df$a)) %>% 
  summarise(jackmean = mean(b))

returns

  jackmean
1      3.5

the non-grouper non-snapper mean needs a ! in front of the grepl

df %>% 
  filter(!grepl("jack", df$a)) %>% 
  summarise(notjackmean = mean(b))

gives

  notjackmean
1           4

Upvotes: 1

Related Questions