Parseltongue
Parseltongue

Reputation: 11707

Dynamically create new columns based upon group maximums

Let's say I have a dataframe called DT. I want to group based upon the "Country" column, and find the maximum value for columns B, C, D, E, F. I then want to create new columns called B_max, C_max, D_max, E_max, and F_max that have the maximum value for each row.

I can trivially do this in Base R with a long list of commands. I'm wondering if there is a more parsimonious way to do this in dplyr? So, for example:

DT %>% 
  group_by(Country) %>% 
  select(a, b, c, d, e, f)%>%
  mutate(max = max())

That gets me the correct maximum values in a dataframe which is collapsed by country. How would I then un-collapse this data-frame and create all the a_max, b_max, c_max, and so on?

Upvotes: 0

Views: 49

Answers (1)

AntoniosK
AntoniosK

Reputation: 16121

Here's a solution using the mtcars dataset:

library(dplyr)

mtcars %>%
  group_by(cyl) %>%
  select(cyl, disp, hp, drat) %>%
  mutate_all(funs(max = max)) %>%
  ungroup()

# # A tibble: 32 x 7
#     cyl  disp    hp  drat disp_max hp_max drat_max
#   <dbl> <dbl> <dbl> <dbl>    <dbl>  <dbl>    <dbl>
# 1  6.00   160 110    3.90      258    175     3.92
# 2  6.00   160 110    3.90      258    175     3.92
# 3  4.00   108  93.0  3.85      147    113     4.93
# 4  6.00   258 110    3.08      258    175     3.92
# 5  8.00   360 175    3.15      472    335     4.22
# 6  6.00   225 105    2.76      258    175     3.92
# 7  8.00   360 245    3.21      472    335     4.22
# 8  4.00   147  62.0  3.69      147    113     4.93
# 9  4.00   141  95.0  3.92      147    113     4.93
# 10 6.00   168 123    3.92      258    175     3.92
# # ... with 22 more rows

Note that (a) you have to include the grouping variable inside select, otherwise the process will force it to be included and show you a warning, (b) you need to use funs() within mutate_all if you want to have the _max prefix and keep the original columns, (c) the grouping variable is not affected by mutate_all.

Otherwise, if you don't use funs() it will look like this:

mtcars %>%
  group_by(cyl) %>%
  select(cyl, disp, hp, drat) %>%
  mutate_all(max) %>%
  ungroup()

# # A tibble: 32 x 4
#     cyl  disp    hp  drat
#   <dbl> <dbl> <dbl> <dbl>
# 1  6.00   258   175  3.92
# 2  6.00   258   175  3.92
# 3  4.00   147   113  4.93
# 4  6.00   258   175  3.92
# 5  8.00   472   335  4.22
# 6  6.00   258   175  3.92
# 7  8.00   472   335  4.22
# 8  4.00   147   113  4.93
# 9  4.00   147   113  4.93
# 10 6.00   258   175  3.92
# # ... with 22 more rows

You can see that in this case the original columns were replaced by the max values as they have the same names.

If you want to do follow the same method, but without select in order to keep all your original columns you can use mutate_at instead like this:

library(dplyr)

mtcars %>%
  group_by(cyl) %>%
  mutate_at(vars(disp,hp,drat), funs(max = max)) %>%
  ungroup()

# # A tibble: 32 x 14
#     mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb disp_max hp_max drat_max
#   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>    <dbl>  <dbl>    <dbl>
# 1  21.0  6.00   160 110    3.90  2.62  16.5  0     1.00  4.00  4.00      258    175     3.92
# 2  21.0  6.00   160 110    3.90  2.88  17.0  0     1.00  4.00  4.00      258    175     3.92
# 3  22.8  4.00   108  93.0  3.85  2.32  18.6  1.00  1.00  4.00  1.00      147    113     4.93
# 4  21.4  6.00   258 110    3.08  3.22  19.4  1.00  0     3.00  1.00      258    175     3.92
# 5  18.7  8.00   360 175    3.15  3.44  17.0  0     0     3.00  2.00      472    335     4.22
# 6  18.1  6.00   225 105    2.76  3.46  20.2  1.00  0     3.00  1.00      258    175     3.92
# 7  14.3  8.00   360 245    3.21  3.57  15.8  0     0     3.00  4.00      472    335     4.22
# 8  24.4  4.00   147  62.0  3.69  3.19  20.0  1.00  0     4.00  2.00      147    113     4.93
# 9  22.8  4.00   141  95.0  3.92  3.15  22.9  1.00  0     4.00  2.00      147    113     4.93
# 10 19.2  6.00   168 123    3.92  3.44  18.3  1.00  0     4.00  4.00      258    175     3.92
# # ... with 22 more rows

Upvotes: 3

Related Questions