Reputation: 459
I would like to use summarise()
from dplyr after grouping data to compute a new variable. But, I would like it to use one equation for some of the data and a second equation for the rest of the data.
I have tried using group_by()
and and summarise()
with if_else()
but it isn't working.
Here's an example. Let's say--for some reason--I wanted to find a special value for sepal length. For the species 'setosa' this special value is twice the mean of the sepal length. For all of the other species it is simply the mean of sepal length. This is the code I've tried, but it doesn't work with summarise()
library(dplyr)
iris %>%
group_by(Species) %>%
summarise(sepal_special = if_else(Species == "setosa", mean(Sepal.Length)*2, mean(Sepal.Length)))
This idea works with mutate()
but I would need to re-format the tibble to be the dataset I am looking for.
library(dplyr)
iris %>%
group_by(Species) %>%
mutate(sepal_special = if_else(Species == "setosa", mean(Sepal.Length)*2, mean(Sepal.Length)))
This is how I want the resulting tibble to be laid out:
library(dplyr)
iris %>%
group_by(Species)%>%
summarise(sepal_mean = mean(Sepal.Length))
# A tibble: 3 x 2
# Species sepal_special
# <fctr> <dbl>
#1 setosa 5.01
#2 versicolor 5.94
#3 virginica 6.59
#>
But my result would show the value for setosa x 2
# A tibble: 3 x 2
# Species sepal_special
# <fctr> <dbl>
#1 setosa **10.02**
#2 versicolor 5.94
#3 virginica 6.59
#>
Suggestions? I feel like I've really searched for ways to use if_else()
with summarise()
but can't find it anywhere, which means there must be a better way.
Thanks!
Upvotes: 4
Views: 106
Reputation: 33782
Another option: since twice the mean is the same as the mean of twice the values, you can double the sepal lengths for setosa and then summarise:
iris %>%
mutate(Sepal.Length = ifelse(Species == "setosa", 2*Sepal.Length, Sepal.Length)) %>%
group_by(Species) %>%
summarise(sepal_special = mean(Sepal.Length))
# A tibble: 3 x 2
Species sepal_special
<fct> <dbl>
1 setosa 10.0
2 versicolor 5.94
3 virginica 6.59
Upvotes: 1
Reputation: 887048
After the mutate
step, use summarise
to get the first
element of 'sepal_special' for each 'Species'
iris %>%
group_by(Species) %>%
mutate(sepal_special = if_else(Species == "setosa",
mean(Sepal.Length)*2, mean(Sepal.Length))) %>%
summarise(sepal_special = first(sepal_special))
# A tibble: 3 x 2
# Species sepal_special
# <fctr> <dbl>
#1 setosa 10.0
#2 versicolor 5.94
#3 virginica 6.59
Or instead of calling the mutate
, after the if_else
is applied, get the first value in summarise
iris %>%
group_by(Species) %>%
summarise(sepal_special = if_else(Species == "setosa",
mean(Sepal.Length)*2, mean(Sepal.Length))[1])
# A tibble: 3 x 2
# Species sepal_special
# <fctr> <dbl>
#1 setosa 10.0
#2 versicolor 5.94
#3 virginica 6.59
Upvotes: 2