How to use quantile function with dplyr summarize_at

Question

I'm trying to calculate 25, 50 and 75 percentile of all cuantitative variables grouped by the specie of the dataset iris, so using dplyr::summarize_at function is possible to do it just once. I use the following code but i allways get an error:

iris %>%
  group_by(Species) %>% 
  summarize_at(dplyr::vars(c("Sepal.Length","Sepal.Width","Petal.Length","Petal.Width")),
               .funs=c("25%"=quantile(0.25),
                       "50%"=quantile(0.50),
                       "75%"=quantile(0.75)))

This is the error i get: "Error: expecting a one sided formula, a function, or a function name."

Thank you for your help.

NelsonGon · Accepted Answer

Using the developer version of dplyr(0.8.9) we can use summarise with across. One drawback is that the names of the quantiles are not returned although we can know since we do our operations in the order we desire:

iris %>%
  group_by(Species) %>% 
   summarise(across(is.numeric,~c(`25%`=quantile(.x,0.25), `50%`=
                                       quantile(.x,0.5),
                                     `75%`= quantile(.x,0.75))))

The above is equivalent to:

 iris %>%
   group_by(Species) %>% 
   summarise_if(is.numeric,~c(`25%`=quantile(.x,0.25), `50%`=
                                       quantile(.x,0.5),
                                     `75%`= quantile(.x,0.75)))

Result:

# A tibble: 9 x 5
  Species    Sepal.Length Sepal.Width Petal.Length Petal.Width
                                     
1 setosa             4.8         3.2          1.4          0.2
2 setosa             5           3.4          1.5          0.2
3 setosa             5.2         3.68         1.58         0.3
4 versicolor         5.6         2.52         4            1.2
5 versicolor         5.9         2.8          4.35         1.3
6 versicolor         6.3         3            4.6          1.5
7 virginica          6.22        2.8          5.1          1.8
8 virginica          6.5         3            5.55         2  
9 virginica          6.9         3.18         5.88         2.3

A possibility to add the names of the quantiles. Note however that dplyr and the tidyverse do not recycle vectors which means we'll have to hardcode this:

iris %>%
  group_by(Species) %>% 
   summarise_if(is.numeric,~c(`25%`=quantile(.x,0.25), `50%`=
                                       quantile(.x,0.5),
                                     `75%`= quantile(.x,0.75))) %>% 
   mutate(quant= rep(c("25%","50%","75%"),nrow(.) / 3))

You can also save the summarise result(res here) and resort to good ol' base for the recycle: res$quant <- c("25%","50%","75%")

# A tibble: 9 x 6
  Species    Sepal.Length Sepal.Width Petal.Length Petal.Width quant
                                      
1 setosa             4.8         3.2          1.4          0.2 25%  
2 setosa             5           3.4          1.5          0.2 50%  
3 setosa             5.2         3.68         1.58         0.3 75%  
4 versicolor         5.6         2.52         4            1.2 25%  
5 versicolor         5.9         2.8          4.35         1.3 50%  
6 versicolor         6.3         3            4.6          1.5 75%  
7 virginica          6.22        2.8          5.1          1.8 25%  
8 virginica          6.5         3            5.55         2   50%  
9 virginica          6.9         3.18         5.88         2.3 75%

How to use quantile function with dplyr summarize_at

Answers (2)

Related Questions