Mutate each column with function with two parameter grouped by another column

Question

Following dataset represents my situations:

library(dplyr)
df <- data_frame(
  G1 = rep(1:2, each = 10),
  G2 = rep(1:10, 2),
  C1 = rnorm(20),
  C2 = rnorm(20),
  C3 = rnorm(20),
  C4 = rnorm(20)
)

I want to perform following operation,

df %>%
  group_by(G1, G2) %>%
  mutate(
    C1 = C1 - C2,
    C2 = C2 - C2,
    C3 = C3 - C2,
    C4 = C4 - C2
  )

If there are only 4 columns (C1, C2, C3 and C4), I can apply above solution. However, I have many columns and for each of them, I need to do the same operation. Is there any concise and simple solution that could extend this problem to many columns?

Nate · Accepted Answer

If you can find some commonality in the column names you wish to mutate you can take advantage of dplyr::mutate_at() :

df %>%
    group_by(G1, G2) %>%
    mutate_at(vars(starts_with("C")), funs(. - C2))

Edit

Because mutate() operates and stores the result for each column sequentially you have two options to get around the problem. You could use reorder(df, everything(), C2) so your C2 is the last in your data.frame or add a second line like this:

set.seed(1)
library(dplyr)
df <- data_frame(
    G1 = rep(1:2, each = 10),
    G2 = rep(1:10, 2),
    C1 = rnorm(20, 0),
    C2 = rnorm(20, 1),
    C3 = rnorm(20, 10),
    C4 = rnorm(20, 100)
)


df %>%
    mutate_at(vars(starts_with("C"), -C2), funs(. - C2)) %>%
    mutate_at(vars(C2), funs(. - C2))

This just does the mutate for every column except C2 on the first line. Then the second line goes back and mutates C2 after the other columns are happily subtracted.

Mutate each column with function with two parameter grouped by another column

Answers (2)

Edit

Related Questions