TheRimalaya
TheRimalaya

Reputation: 4592

Mutate each column with function with two parameter grouped by another column

Following dataset represents my situations:

library(dplyr)
df <- data_frame(
  G1 = rep(1:2, each = 10),
  G2 = rep(1:10, 2),
  C1 = rnorm(20),
  C2 = rnorm(20),
  C3 = rnorm(20),
  C4 = rnorm(20)
)

I want to perform following operation,

df %>%
  group_by(G1, G2) %>%
  mutate(
    C1 = C1 - C2,
    C2 = C2 - C2,
    C3 = C3 - C2,
    C4 = C4 - C2
  )

If there are only 4 columns (C1, C2, C3 and C4), I can apply above solution. However, I have many columns and for each of them, I need to do the same operation. Is there any concise and simple solution that could extend this problem to many columns?

Upvotes: 1

Views: 639

Answers (2)

Nate
Nate

Reputation: 10671

If you can find some commonality in the column names you wish to mutate you can take advantage of dplyr::mutate_at() :

df %>%
    group_by(G1, G2) %>%
    mutate_at(vars(starts_with("C")), funs(. - C2))

Edit

Because mutate() operates and stores the result for each column sequentially you have two options to get around the problem. You could use reorder(df, everything(), C2) so your C2 is the last in your data.frame or add a second line like this:

set.seed(1)
library(dplyr)
df <- data_frame(
    G1 = rep(1:2, each = 10),
    G2 = rep(1:10, 2),
    C1 = rnorm(20, 0),
    C2 = rnorm(20, 1),
    C3 = rnorm(20, 10),
    C4 = rnorm(20, 100)
)


df %>%
    mutate_at(vars(starts_with("C"), -C2), funs(. - C2)) %>%
    mutate_at(vars(C2), funs(. - C2))

This just does the mutate for every column except C2 on the first line. Then the second line goes back and mutates C2 after the other columns are happily subtracted.

Upvotes: 2

Mike H.
Mike H.

Reputation: 14360

How about using data.table specifying the columns you want with .SDcols?

library(data.table)
cols <- colnames(df)[which(grepl("C",colnames(df)))]
dt <- setDT(df)[, lapply(.SD, function(x) x - C2), by=.(G1,G2), .SDcols = cols]

Upvotes: 2

Related Questions