Reputation: 11
I'd like to add a column to my data set that is populated with the averages the values of a given category.
Using the iris data set as an example
library(datasets)
head(iris)
unique(iris$Species)
mutate(iris, spec_avg_pet_width=1)
I'd like to replace "1" with the average of virginica, setosa and versicolor corresponding correctly to the Species of that row. I will ultimately be using this average as a way to arrange categories in a ggplot. The excel equivalent of this would be an AverageIfs function.
I know how to do a conditional mean based on an absolute value"
mean(iris[iris$Species =='setosa','Petal.Width'])
But cannot figure out how to do a conditional mean based on a relative, corresponding value.
Essentially I would like my new column to return the values of 0.246, 2.026 or 1.326 depending on whether the row is setosa, virginica or versicolor.
Thank you!
Upvotes: 0
Views: 19
Reputation: 173858
This is exactly what the base R function ave
does:
ave(iris$Petal.Width, iris$Species)
#> [1] 0.246 0.246 0.246 0.246 0.246 0.246 0.246 0.246 0.246 0.246 0.246 0.246
#> [13] 0.246 0.246 0.246 0.246 0.246 0.246 0.246 0.246 0.246 0.246 0.246 0.246
#> [25] 0.246 0.246 0.246 0.246 0.246 0.246 0.246 0.246 0.246 0.246 0.246 0.246
#> [37] 0.246 0.246 0.246 0.246 0.246 0.246 0.246 0.246 0.246 0.246 0.246 0.246
#> [49] 0.246 0.246 1.326 1.326 1.326 1.326 1.326 1.326 1.326 1.326 1.326 1.326
#> [61] 1.326 1.326 1.326 1.326 1.326 1.326 1.326 1.326 1.326 1.326 1.326 1.326
#> [73] 1.326 1.326 1.326 1.326 1.326 1.326 1.326 1.326 1.326 1.326 1.326 1.326
#> [85] 1.326 1.326 1.326 1.326 1.326 1.326 1.326 1.326 1.326 1.326 1.326 1.326
#> [97] 1.326 1.326 1.326 1.326 2.026 2.026 2.026 2.026 2.026 2.026 2.026 2.026
#> [109] 2.026 2.026 2.026 2.026 2.026 2.026 2.026 2.026 2.026 2.026 2.026 2.026
#> [121] 2.026 2.026 2.026 2.026 2.026 2.026 2.026 2.026 2.026 2.026 2.026 2.026
#> [133] 2.026 2.026 2.026 2.026 2.026 2.026 2.026 2.026 2.026 2.026 2.026 2.026
#> [145] 2.026 2.026 2.026 2.026 2.026 2.026
So, for example, to get your new column you could do:
within(iris, spec_avg_pet_width <- ave(Sepal.Width, Species))
Created on 2022-09-23 with reprex v2.0.2
Upvotes: 0