Reputation: 335
Suppose I have a data frame that looks like this:
fact_code style_serial ss rib button rib_s button_s
1008 style_1018 1 0 0 1 1
1008 style_1018 0 1 0 1 1
1008 style_1018 0 1 0 1 1
1008 style_1018 0 0 1 1 1
1008 style_1003 1 0 1 0 1
1008 style_1003 0 0 1 0 1
1008 style_1003 0 0 0 0 1
1008 style_1003 0 0 0 0 1
1004 style_1197 1 0 0 1 0
1004 style_1197 0 0 0 1 0
1004 style_1197 0 0 0 1 0
1004 style_1197 0 1 0 1 0
The key variables, rib and button are dummy variables. They indicate whether a particular garment style produced by a factory has rib or button or both. I then want to take the maximum of these dummy variables grouped by fact_code
and style_serial
and in this case I name them as rib_s
and button_s
.
The variables rib_s
and button_s
were generated as follows:
df <- df %>% group_by(fact_code, style_serial) %>% mutate(rib_s = max(rib, na.rm = TRUE))
df <- df %>% group_by(fact_code, style_serial) %>% mutate(button_s = max(button, na.rm = TRUE))
Now suppose that I have around 20 such variables. I wanted to create a loop that runs as many times as number of variables and each time executes the above code for each of the 20 dummy variables.
I have tried this for the 2 variables as a test:
for (xx in c("rib", "button")){
df <- df %>%
group_by_(fact_code, style_serial) %>%
yy <- paste0(c(xx, "s"), collapse = "_") %>%
mutate_(yy = max(xx, na.rm = TRUE))
}
But it gives me the following error message:
Error in UseMethod("mutate_") :
no applicable method for 'mutate_' applied to an object of class "character"
I have also tried base r functions for example tapply
and aggregate
but always getting some error messages.
Do you have a way to get round this problem?
Upvotes: 1
Views: 985
Reputation: 50668
This can be solved very succinctly using dplyr::mutate_at
:
library(dplyr)
key <- c("rib", "button")
df %>%
group_by(fact_code, style_serial) %>%
mutate_at(vars(key), funs(max = max(.)))
## A tibble: 12 x 9
## Groups: fact_code, style_serial [3]
# fact_code style_serial ss rib button rib_s button_s rib_max button_max
# <int> <fct> <int> <int> <int> <int> <int> <dbl> <dbl>
# 1 1008 style_1018 1 0 0 1 1 1. 1.
# 2 1008 style_1018 0 1 0 1 1 1. 1.
# 3 1008 style_1018 0 1 0 1 1 1. 1.
# 4 1008 style_1018 0 0 1 1 1 1. 1.
# 5 1008 style_1003 1 0 1 0 1 0. 1.
# 6 1008 style_1003 0 0 1 0 1 0. 1.
# 7 1008 style_1003 0 0 0 0 1 0. 1.
# 8 1008 style_1003 0 0 0 0 1 0. 1.
# 9 1004 style_1197 1 0 0 1 0 1. 0.
#10 1004 style_1197 0 0 0 1 0 1. 0.
#11 1004 style_1197 0 0 0 1 0 1. 0.
#12 1004 style_1197 0 1 0 1 0 1. 0.
This automatically calculates the maximum of values (per group) for variables given in key
, and creates new columns by appending _max
to the corresponding column name. Note that you can also use the usual select
semantics (e.g. contains
, matches
, starts_with
, ends_with
etc.) within vars(...)
if you don't want to (or can't) define key
beforehand.
df <- read.table(text =
"fact_code style_serial ss rib button rib_s button_s
1008 style_1018 1 0 0 1 1
1008 style_1018 0 1 0 1 1
1008 style_1018 0 1 0 1 1
1008 style_1018 0 0 1 1 1
1008 style_1003 1 0 1 0 1
1008 style_1003 0 0 1 0 1
1008 style_1003 0 0 0 0 1
1008 style_1003 0 0 0 0 1
1004 style_1197 1 0 0 1 0
1004 style_1197 0 0 0 1 0
1004 style_1197 0 0 0 1 0
1004 style_1197 0 1 0 1 0", header = T)
Upvotes: 2