Reputation: 4592
Following dataset represents my situations:
library(dplyr)
df <- data_frame(
G1 = rep(1:2, each = 10),
G2 = rep(1:10, 2),
C1 = rnorm(20),
C2 = rnorm(20),
C3 = rnorm(20),
C4 = rnorm(20)
)
I want to perform following operation,
df %>%
group_by(G1, G2) %>%
mutate(
C1 = C1 - C2,
C2 = C2 - C2,
C3 = C3 - C2,
C4 = C4 - C2
)
If there are only 4 columns (C1, C2, C3 and C4), I can apply above solution. However, I have many columns and for each of them, I need to do the same operation. Is there any concise and simple solution that could extend this problem to many columns?
Upvotes: 1
Views: 639
Reputation: 10671
If you can find some commonality in the column names you wish to mutate you can take advantage of dplyr::mutate_at()
:
df %>%
group_by(G1, G2) %>%
mutate_at(vars(starts_with("C")), funs(. - C2))
Because mutate()
operates and stores the result for each column sequentially you have two options to get around the problem. You could use reorder(df, everything(), C2)
so your C2
is the last in your data.frame
or add a second line like this:
set.seed(1)
library(dplyr)
df <- data_frame(
G1 = rep(1:2, each = 10),
G2 = rep(1:10, 2),
C1 = rnorm(20, 0),
C2 = rnorm(20, 1),
C3 = rnorm(20, 10),
C4 = rnorm(20, 100)
)
df %>%
mutate_at(vars(starts_with("C"), -C2), funs(. - C2)) %>%
mutate_at(vars(C2), funs(. - C2))
This just does the mutate for every column except C2
on the first line. Then the second line goes back and mutates C2
after the other columns are happily subtracted.
Upvotes: 2
Reputation: 14360
How about using data.table
specifying the columns you want with .SDcols
?
library(data.table)
cols <- colnames(df)[which(grepl("C",colnames(df)))]
dt <- setDT(df)[, lapply(.SD, function(x) x - C2), by=.(G1,G2), .SDcols = cols]
Upvotes: 2