Reputation: 65
I am fairly new to R, and am having some difficulty with what seems like should be a pretty simple procedure. I have a data frame called "Bottom" containing columns: "Species", "Category", and "Y9:Y15" (signifying years 2009-2015. The "Species" column contains fish names, the "Category" contains the letter "B" all the way down signifying bottom fish (this data frame was taken out of a larger one with many different categories of fish) and "Y9:Y15" contains the prices of the fish species in the first column:
Species Category Y9 Y10 Y11 Y12 Y13 Y14 Y15
Amberjack B 2.65 NA NA NA 3.00 NA 3.31
Ambon emperor B 2.62 2.63 NA NA 3.75 3.06 3.00
Bigeye bream B 2.62 2.21 2.86 NA 3.09 3.10 3.02
Bigeye scad B 3.33 NA 2.81 2.51 2.62 3.00 2.77
Bigeye trevally B 2.69 2.75 NA NA 3.73 3.22 3.00
Black jack B 2.66 2.52 2.55 3.00 3.75 3.26 3.42
I am trying to calculate 3 averages based on the following three conditions:
1) the average of all fish species with "grouper" in the name 2) the average of all fish species with "snapper" in the name 3) the average of all other fish species with neither of the above conditions.
I've found that I can get a vector of true or false for my conditions with grepl : grepl("grouper",Bottom$Species)], but I haven't figured out how to add this to a function telling R to calculate the average based on the "TRUE" values of the vector.
Any suggestions for this would be greatly appreciated.
Thank you!
Upvotes: 1
Views: 973
Reputation: 136
If you don't require the averages being appended to the original dataframe, here's a sample using a modified version of your data
a <- c("Amber jack", "Ambon emperor", "Bigeye bream", "Black jack")
b <- c(6, 4, 4, 1)
df <- data.frame(a, b)
df shows
a b
1 Amber jack 6
2 Ambon emperor 4
3 Bigeye bream 4
4 Black jack 1
next use filter from dplyr and your grepl expression to capture the fish names
df %>%
filter(grepl("jack", df$a)) %>%
summarise(jackmean = mean(b))
returns
jackmean
1 3.5
the non-grouper non-snapper mean needs a ! in front of the grepl
df %>%
filter(!grepl("jack", df$a)) %>%
summarise(notjackmean = mean(b))
gives
notjackmean
1 4
Upvotes: 1