Mutate new variable based on group-level statistic

Question

I want to append the group-maximum to table of observations, e.g:

iris %>% split(iris$Species) %>% 
    lapply(function(l) mutate(l, species_max = max(Sepal.Width))) %>% 
    bind_rows() %>% .[c(1,51,101),]
    Sepal.Length Sepal.Width Petal.Length Petal.Width    Species species_max
1            5.1         3.5          1.4         0.2     setosa         4.4
51           7.0         3.2          4.7         1.4 versicolor         3.4
101          6.3         3.3          6.0         2.5  virginica         3.8

Is there a more elegant dplyr::group_by solution to achieve this?

talat · Accepted Answer

How about this:

group_by(iris, Species) %>% 
  mutate(species_max = max(Sepal.Width)) %>% 
  slice(1)

# Source: local data frame [3 x 6]
# Groups: Species [3]
# 
#   Sepal.Length Sepal.Width Petal.Length Petal.Width    Species species_max
#                                            
# 1          5.1         3.5          1.4         0.2     setosa         4.4
# 2          7.0         3.2          4.7         1.4 versicolor         3.4
# 3          6.3         3.3          6.0         2.5  virginica         3.8

The difficulty here is that you need to summarise multiple columns (for which summarise_all would be great) but at the same time you need to add a new column (for which you either need a simple summarise or mutate call).

In this regard data.table allows greater flexibility since it only relies on a list in its j-argument. So you can do it as follows with data.table, just as a comparison:

library(data.table)
dt <- as.data.table(iris)
dt[, c(lapply(.SD, first), species_max = max(Sepal.Width)), by = Species]

Mutate new variable based on group-level statistic

Answers (1)

Related Questions