Reputation: 11707
Let's say I have a dataframe called DT
. I want to group based upon the "Country" column, and find the maximum value for columns B, C, D, E, F. I then want to create new columns called B_max, C_max, D_max, E_max, and F_max that have the maximum value for each row.
I can trivially do this in Base R with a long list of commands. I'm wondering if there is a more parsimonious way to do this in dplyr? So, for example:
DT %>%
group_by(Country) %>%
select(a, b, c, d, e, f)%>%
mutate(max = max())
That gets me the correct maximum values in a dataframe which is collapsed by country. How would I then un-collapse this data-frame and create all the a_max, b_max, c_max, and so on?
Upvotes: 0
Views: 49
Reputation: 16121
Here's a solution using the mtcars
dataset:
library(dplyr)
mtcars %>%
group_by(cyl) %>%
select(cyl, disp, hp, drat) %>%
mutate_all(funs(max = max)) %>%
ungroup()
# # A tibble: 32 x 7
# cyl disp hp drat disp_max hp_max drat_max
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 6.00 160 110 3.90 258 175 3.92
# 2 6.00 160 110 3.90 258 175 3.92
# 3 4.00 108 93.0 3.85 147 113 4.93
# 4 6.00 258 110 3.08 258 175 3.92
# 5 8.00 360 175 3.15 472 335 4.22
# 6 6.00 225 105 2.76 258 175 3.92
# 7 8.00 360 245 3.21 472 335 4.22
# 8 4.00 147 62.0 3.69 147 113 4.93
# 9 4.00 141 95.0 3.92 147 113 4.93
# 10 6.00 168 123 3.92 258 175 3.92
# # ... with 22 more rows
Note that (a) you have to include the grouping variable inside select
, otherwise the process will force it to be included and show you a warning, (b) you need to use funs()
within mutate_all if you want to have the _max
prefix and keep the original columns, (c) the grouping variable is not affected by mutate_all
.
Otherwise, if you don't use funs()
it will look like this:
mtcars %>%
group_by(cyl) %>%
select(cyl, disp, hp, drat) %>%
mutate_all(max) %>%
ungroup()
# # A tibble: 32 x 4
# cyl disp hp drat
# <dbl> <dbl> <dbl> <dbl>
# 1 6.00 258 175 3.92
# 2 6.00 258 175 3.92
# 3 4.00 147 113 4.93
# 4 6.00 258 175 3.92
# 5 8.00 472 335 4.22
# 6 6.00 258 175 3.92
# 7 8.00 472 335 4.22
# 8 4.00 147 113 4.93
# 9 4.00 147 113 4.93
# 10 6.00 258 175 3.92
# # ... with 22 more rows
You can see that in this case the original columns were replaced by the max values as they have the same names.
If you want to do follow the same method, but without select
in order to keep all your original columns you can use mutate_at
instead like this:
library(dplyr)
mtcars %>%
group_by(cyl) %>%
mutate_at(vars(disp,hp,drat), funs(max = max)) %>%
ungroup()
# # A tibble: 32 x 14
# mpg cyl disp hp drat wt qsec vs am gear carb disp_max hp_max drat_max
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 21.0 6.00 160 110 3.90 2.62 16.5 0 1.00 4.00 4.00 258 175 3.92
# 2 21.0 6.00 160 110 3.90 2.88 17.0 0 1.00 4.00 4.00 258 175 3.92
# 3 22.8 4.00 108 93.0 3.85 2.32 18.6 1.00 1.00 4.00 1.00 147 113 4.93
# 4 21.4 6.00 258 110 3.08 3.22 19.4 1.00 0 3.00 1.00 258 175 3.92
# 5 18.7 8.00 360 175 3.15 3.44 17.0 0 0 3.00 2.00 472 335 4.22
# 6 18.1 6.00 225 105 2.76 3.46 20.2 1.00 0 3.00 1.00 258 175 3.92
# 7 14.3 8.00 360 245 3.21 3.57 15.8 0 0 3.00 4.00 472 335 4.22
# 8 24.4 4.00 147 62.0 3.69 3.19 20.0 1.00 0 4.00 2.00 147 113 4.93
# 9 22.8 4.00 141 95.0 3.92 3.15 22.9 1.00 0 4.00 2.00 147 113 4.93
# 10 19.2 6.00 168 123 3.92 3.44 18.3 1.00 0 4.00 4.00 258 175 3.92
# # ... with 22 more rows
Upvotes: 3