Reputation: 15
I'm trying to calculate 25, 50 and 75 percentile of all cuantitative variables grouped by the specie of the dataset iris, so using dplyr::summarize_at function is possible to do it just once. I use the following code but i allways get an error:
iris %>%
group_by(Species) %>%
summarize_at(dplyr::vars(c("Sepal.Length","Sepal.Width","Petal.Length","Petal.Width")),
.funs=c("25%"=quantile(0.25),
"50%"=quantile(0.50),
"75%"=quantile(0.75)))
This is the error i get: "Error: expecting a one sided formula, a function, or a function name."
Thank you for your help.
Upvotes: 1
Views: 550
Reputation: 13319
Using the developer version of dplyr(0.8.9) we can use summarise
with across
. One drawback is that the names of the quantiles are not returned although we can know since we do our operations in the order we desire:
iris %>%
group_by(Species) %>%
summarise(across(is.numeric,~c(`25%`=quantile(.x,0.25), `50%`=
quantile(.x,0.5),
`75%`= quantile(.x,0.75))))
The above is equivalent to:
iris %>%
group_by(Species) %>%
summarise_if(is.numeric,~c(`25%`=quantile(.x,0.25), `50%`=
quantile(.x,0.5),
`75%`= quantile(.x,0.75)))
Result:
# A tibble: 9 x 5
Species Sepal.Length Sepal.Width Petal.Length Petal.Width
<fct> <dbl> <dbl> <dbl> <dbl>
1 setosa 4.8 3.2 1.4 0.2
2 setosa 5 3.4 1.5 0.2
3 setosa 5.2 3.68 1.58 0.3
4 versicolor 5.6 2.52 4 1.2
5 versicolor 5.9 2.8 4.35 1.3
6 versicolor 6.3 3 4.6 1.5
7 virginica 6.22 2.8 5.1 1.8
8 virginica 6.5 3 5.55 2
9 virginica 6.9 3.18 5.88 2.3
A possibility to add the names of the quantiles. Note however that dplyr
and the tidyverse
do not recycle vectors which means we'll have to hardcode this:
iris %>%
group_by(Species) %>%
summarise_if(is.numeric,~c(`25%`=quantile(.x,0.25), `50%`=
quantile(.x,0.5),
`75%`= quantile(.x,0.75))) %>%
mutate(quant= rep(c("25%","50%","75%"),nrow(.) / 3))
You can also save the summarise
result(res
here) and resort to good ol' base
for the recycle: res$quant <- c("25%","50%","75%")
# A tibble: 9 x 6
Species Sepal.Length Sepal.Width Petal.Length Petal.Width quant
<fct> <dbl> <dbl> <dbl> <dbl> <chr>
1 setosa 4.8 3.2 1.4 0.2 25%
2 setosa 5 3.4 1.5 0.2 50%
3 setosa 5.2 3.68 1.58 0.3 75%
4 versicolor 5.6 2.52 4 1.2 25%
5 versicolor 5.9 2.8 4.35 1.3 50%
6 versicolor 6.3 3 4.6 1.5 75%
7 virginica 6.22 2.8 5.1 1.8 25%
8 virginica 6.5 3 5.55 2 50%
9 virginica 6.9 3.18 5.88 2.3 75%
Upvotes: 2
Reputation: 6226
I can propose you a data.table
solution. Unfortunately, I don't have a dplyr
solution in mind.
dt <- data.table::as.data.table(iris)
dt <- dt[,lapply(.SD, quantile, probs = c(.25,.5,.75)),
.SDcols = c("Sepal.Length","Sepal.Width","Petal.Length","Petal.Width"),
by = "Species"]
dt[,'quantile' := c("25%","50%","75%")]
# Sepal.Length Sepal.Width Petal.Length Petal.Width # Species Sepal.Length Sepal.Width Petal.Length Petal.Width quantile
# 1: setosa 4.800 3.200 1.400 0.2 25%
# 2: setosa 5.000 3.400 1.500 0.2 50%
# 3: setosa 5.200 3.675 1.575 0.3 75%
# 4: versicolor 5.600 2.525 4.000 1.2 25%
# 5: versicolor 5.900 2.800 4.350 1.3 50%
# 6: versicolor 6.300 3.000 4.600 1.5 75%
# 7: virginica 6.225 2.800 5.100 1.8 25%
# 8: virginica 6.500 3.000 5.550 2.0 50%
# 9: virginica 6.900 3.175 5.875 2.3 75%
Hope that helps!
Upvotes: 2