AdrieSC
AdrieSC

Reputation: 459

How does one summarize with conditions into a single variable in R?

I would like to use summarise() from dplyr after grouping data to compute a new variable. But, I would like it to use one equation for some of the data and a second equation for the rest of the data.

I have tried using group_by() and and summarise() with if_else() but it isn't working.

Here's an example. Let's say--for some reason--I wanted to find a special value for sepal length. For the species 'setosa' this special value is twice the mean of the sepal length. For all of the other species it is simply the mean of sepal length. This is the code I've tried, but it doesn't work with summarise()

library(dplyr)
iris %>%
   group_by(Species) %>%
   summarise(sepal_special = if_else(Species == "setosa", mean(Sepal.Length)*2, mean(Sepal.Length)))

This idea works with mutate() but I would need to re-format the tibble to be the dataset I am looking for.

library(dplyr)
iris %>%
   group_by(Species) %>%
   mutate(sepal_special = if_else(Species == "setosa", mean(Sepal.Length)*2, mean(Sepal.Length)))

This is how I want the resulting tibble to be laid out:

library(dplyr)
iris %>%
group_by(Species)%>%
summarise(sepal_mean = mean(Sepal.Length))

  # A tibble: 3 x 2
  # Species    sepal_special
  # <fctr>          <dbl>
  #1 setosa           5.01
  #2 versicolor       5.94
  #3 virginica        6.59
  #> 

But my result would show the value for setosa x 2

# A tibble: 3 x 2
      # Species    sepal_special
      # <fctr>          <dbl>
      #1 setosa          **10.02**
      #2 versicolor       5.94
      #3 virginica        6.59
      #> 

Suggestions? I feel like I've really searched for ways to use if_else() with summarise() but can't find it anywhere, which means there must be a better way.

Thanks!

Upvotes: 4

Views: 106

Answers (2)

neilfws
neilfws

Reputation: 33782

Another option: since twice the mean is the same as the mean of twice the values, you can double the sepal lengths for setosa and then summarise:

iris %>% 
  mutate(Sepal.Length = ifelse(Species == "setosa", 2*Sepal.Length, Sepal.Length)) %>% 
  group_by(Species) %>% 
  summarise(sepal_special = mean(Sepal.Length))

# A tibble: 3 x 2
  Species    sepal_special
  <fct>              <dbl>
1 setosa             10.0 
2 versicolor          5.94
3 virginica           6.59

Upvotes: 1

akrun
akrun

Reputation: 887048

After the mutate step, use summarise to get the first element of 'sepal_special' for each 'Species'

iris %>% 
  group_by(Species) %>% 
  mutate(sepal_special = if_else(Species == "setosa", 
               mean(Sepal.Length)*2, mean(Sepal.Length))) %>% 
 summarise(sepal_special = first(sepal_special))
# A tibble: 3 x 2
#  Species    sepal_special
#   <fctr>             <dbl>
#1 setosa             10.0 
#2 versicolor          5.94
#3 virginica           6.59

Or instead of calling the mutate, after the if_else is applied, get the first value in summarise

iris %>% 
   group_by(Species) %>%
   summarise(sepal_special = if_else(Species == "setosa", 
           mean(Sepal.Length)*2, mean(Sepal.Length))[1]) 
# A tibble: 3 x 2
#  Species    sepal_special
#  <fctr>             <dbl>
#1 setosa             10.0 
#2 versicolor          5.94
#3 virginica           6.59

Upvotes: 2

Related Questions