user3571389
user3571389

Reputation: 335

Generating new variables using for loop and mutate function in R

Suppose I have a data frame that looks like this:

fact_code style_serial ss rib button rib_s button_s
1008      style_1018   1   0  0      1     1 
1008      style_1018   0   1  0      1     1
1008      style_1018   0   1  0      1     1
1008      style_1018   0   0  1      1     1 
1008      style_1003   1   0  1      0     1
1008      style_1003   0   0  1      0     1
1008      style_1003   0   0  0      0     1
1008      style_1003   0   0  0      0     1
1004      style_1197   1   0  0      1     0 
1004      style_1197   0   0  0      1     0
1004      style_1197   0   0  0      1     0
1004      style_1197   0   1  0      1     0

The key variables, rib and button are dummy variables. They indicate whether a particular garment style produced by a factory has rib or button or both. I then want to take the maximum of these dummy variables grouped by fact_code and style_serial and in this case I name them as rib_s and button_s.

The variables rib_s and button_s were generated as follows:

df <- df %>% group_by(fact_code, style_serial) %>% mutate(rib_s = max(rib, na.rm = TRUE))
df <- df %>% group_by(fact_code, style_serial) %>% mutate(button_s = max(button, na.rm = TRUE))

Now suppose that I have around 20 such variables. I wanted to create a loop that runs as many times as number of variables and each time executes the above code for each of the 20 dummy variables.

I have tried this for the 2 variables as a test:

for (xx in c("rib", "button")){
df <- df %>%
group_by_(fact_code, style_serial) %>%
yy <- paste0(c(xx, "s"), collapse = "_") %>%
mutate_(yy = max(xx, na.rm = TRUE))
}

But it gives me the following error message:

Error in UseMethod("mutate_") : no applicable method for 'mutate_' applied to an object of class "character"

I have also tried base r functions for example tapply and aggregate but always getting some error messages.

Do you have a way to get round this problem?

Upvotes: 1

Views: 985

Answers (1)

Maurits Evers
Maurits Evers

Reputation: 50668

This can be solved very succinctly using dplyr::mutate_at:

library(dplyr)
key <- c("rib", "button")
df %>%
    group_by(fact_code, style_serial) %>%
    mutate_at(vars(key), funs(max = max(.)))
## A tibble: 12 x 9
## Groups:   fact_code, style_serial [3]
#   fact_code style_serial    ss   rib button rib_s button_s rib_max button_max
#       <int> <fct>        <int> <int>  <int> <int>    <int>   <dbl>      <dbl>
# 1      1008 style_1018       1     0      0     1        1      1.         1.
# 2      1008 style_1018       0     1      0     1        1      1.         1.
# 3      1008 style_1018       0     1      0     1        1      1.         1.
# 4      1008 style_1018       0     0      1     1        1      1.         1.
# 5      1008 style_1003       1     0      1     0        1      0.         1.
# 6      1008 style_1003       0     0      1     0        1      0.         1.
# 7      1008 style_1003       0     0      0     0        1      0.         1.
# 8      1008 style_1003       0     0      0     0        1      0.         1.
# 9      1004 style_1197       1     0      0     1        0      1.         0.
#10      1004 style_1197       0     0      0     1        0      1.         0.
#11      1004 style_1197       0     0      0     1        0      1.         0.
#12      1004 style_1197       0     1      0     1        0      1.         0.

This automatically calculates the maximum of values (per group) for variables given in key, and creates new columns by appending _max to the corresponding column name. Note that you can also use the usual select semantics (e.g. contains, matches, starts_with, ends_with etc.) within vars(...) if you don't want to (or can't) define key beforehand.


Sample data

df <- read.table(text =
    "fact_code style_serial ss rib button rib_s button_s
1008      style_1018   1   0  0      1     1
1008      style_1018   0   1  0      1     1
1008      style_1018   0   1  0      1     1
1008      style_1018   0   0  1      1     1
1008      style_1003   1   0  1      0     1
1008      style_1003   0   0  1      0     1
1008      style_1003   0   0  0      0     1
1008      style_1003   0   0  0      0     1
1004      style_1197   1   0  0      1     0
1004      style_1197   0   0  0      1     0
1004      style_1197   0   0  0      1     0
1004      style_1197   0   1  0      1     0", header = T)

Upvotes: 2

Related Questions