D.B
D.B

Reputation: 15

How to use quantile function with dplyr summarize_at

I'm trying to calculate 25, 50 and 75 percentile of all cuantitative variables grouped by the specie of the dataset iris, so using dplyr::summarize_at function is possible to do it just once. I use the following code but i allways get an error:

iris %>%
  group_by(Species) %>% 
  summarize_at(dplyr::vars(c("Sepal.Length","Sepal.Width","Petal.Length","Petal.Width")),
               .funs=c("25%"=quantile(0.25),
                       "50%"=quantile(0.50),
                       "75%"=quantile(0.75)))

This is the error i get: "Error: expecting a one sided formula, a function, or a function name."

Thank you for your help.

Upvotes: 1

Views: 550

Answers (2)

NelsonGon
NelsonGon

Reputation: 13319

Using the developer version of dplyr(0.8.9) we can use summarise with across. One drawback is that the names of the quantiles are not returned although we can know since we do our operations in the order we desire:

iris %>%
  group_by(Species) %>% 
   summarise(across(is.numeric,~c(`25%`=quantile(.x,0.25), `50%`=
                                       quantile(.x,0.5),
                                     `75%`= quantile(.x,0.75))))

The above is equivalent to:

 iris %>%
   group_by(Species) %>% 
   summarise_if(is.numeric,~c(`25%`=quantile(.x,0.25), `50%`=
                                       quantile(.x,0.5),
                                     `75%`= quantile(.x,0.75)))

Result:

# A tibble: 9 x 5
  Species    Sepal.Length Sepal.Width Petal.Length Petal.Width
  <fct>             <dbl>       <dbl>        <dbl>       <dbl>
1 setosa             4.8         3.2          1.4          0.2
2 setosa             5           3.4          1.5          0.2
3 setosa             5.2         3.68         1.58         0.3
4 versicolor         5.6         2.52         4            1.2
5 versicolor         5.9         2.8          4.35         1.3
6 versicolor         6.3         3            4.6          1.5
7 virginica          6.22        2.8          5.1          1.8
8 virginica          6.5         3            5.55         2  
9 virginica          6.9         3.18         5.88         2.3

A possibility to add the names of the quantiles. Note however that dplyr and the tidyverse do not recycle vectors which means we'll have to hardcode this:

iris %>%
  group_by(Species) %>% 
   summarise_if(is.numeric,~c(`25%`=quantile(.x,0.25), `50%`=
                                       quantile(.x,0.5),
                                     `75%`= quantile(.x,0.75))) %>% 
   mutate(quant= rep(c("25%","50%","75%"),nrow(.) / 3))

You can also save the summarise result(res here) and resort to good ol' base for the recycle: res$quant <- c("25%","50%","75%")

# A tibble: 9 x 6
  Species    Sepal.Length Sepal.Width Petal.Length Petal.Width quant
  <fct>             <dbl>       <dbl>        <dbl>       <dbl> <chr>
1 setosa             4.8         3.2          1.4          0.2 25%  
2 setosa             5           3.4          1.5          0.2 50%  
3 setosa             5.2         3.68         1.58         0.3 75%  
4 versicolor         5.6         2.52         4            1.2 25%  
5 versicolor         5.9         2.8          4.35         1.3 50%  
6 versicolor         6.3         3            4.6          1.5 75%  
7 virginica          6.22        2.8          5.1          1.8 25%  
8 virginica          6.5         3            5.55         2   50%  
9 virginica          6.9         3.18         5.88         2.3 75% 

Upvotes: 2

linog
linog

Reputation: 6226

I can propose you a data.table solution. Unfortunately, I don't have a dplyr solution in mind.

dt <- data.table::as.data.table(iris)

dt <- dt[,lapply(.SD, quantile, probs = c(.25,.5,.75)),
   .SDcols = c("Sepal.Length","Sepal.Width","Petal.Length","Petal.Width"),
   by = "Species"]

dt[,'quantile' := c("25%","50%","75%")]

# Sepal.Length Sepal.Width Petal.Length Petal.Width # Species Sepal.Length Sepal.Width Petal.Length Petal.Width quantile
# 1:     setosa        4.800       3.200        1.400         0.2      25%
# 2:     setosa        5.000       3.400        1.500         0.2      50%
# 3:     setosa        5.200       3.675        1.575         0.3      75%
# 4: versicolor        5.600       2.525        4.000         1.2      25%
# 5: versicolor        5.900       2.800        4.350         1.3      50%
# 6: versicolor        6.300       3.000        4.600         1.5      75%
# 7:  virginica        6.225       2.800        5.100         1.8      25%
# 8:  virginica        6.500       3.000        5.550         2.0      50%
# 9:  virginica        6.900       3.175        5.875         2.3      75%

Hope that helps!

Upvotes: 2

Related Questions