Reputation: 1003
To visualize the problem, let's assume I have a dataset data
in R with the following columns:
I want to perform an operation on a subset of columns:
data_final <- data %>%
mutate_at(vars(T1, T2), funs(if(param > 100) {
. * T(n)_g1
} else {
. * T(n)_g2
}
How do I reference the correct column name in the expression T(n)_g1
so it fetches data from T1_g1
and T2_g1
, respectively, while mutating?
(in a real case scenario, I have much more columns and conditions, hence manually typing all possible cases is not an option)
Upvotes: 2
Views: 592
Reputation: 160417
if
needs a single comparison, but since this will be a vector, you need if_else
(or ifelse
). I don't know that you can (easily) dynamically determine the other column names based on the to-be-changed name within a quick mutate*
interface. A quick hack could be:
data %>%
mutate(
T1 = if_else(param > 100, T1_g1, T1_g2) * T1,
T2 = if_else(param > 100, T2_g1, T2_g2) * T2
)
but this only works if you have a small/static list of T*
variables to modify.
If there is a dynamic (or just "high") number of these T*
variables, one method includes reshaping the frame to a longer format. (One could argue that a long format might be a better fit for this regardless, so I'll step you through wide-long-mutate as well as wide-long-mutate-wide.)
Some data:
x <- data_frame(
param = c(1L,50L,101L,150L),
T1 = 1:4,
T2 = 5:8,
T1_g1 = (1:4)/10,
T1_g2 = (1:4)*10,
T2_g1 = (5:8)/10,
T2_g2 = (5:8)*10
)
x
# # A tibble: 4 x 7
# param T1 T2 T1_g1 T1_g2 T2_g1 T2_g2
# <int> <int> <int> <dbl> <dbl> <dbl> <dbl>
# 1 1 1 5 0.1 10 0.5 50
# 2 50 2 6 0.2 20 0.6 60
# 3 101 3 7 0.3 30 0.7 70
# 4 150 4 8 0.4 40 0.8 80
First, the first reshaping:
x %>%
gather(k, v, -param) %>%
mutate(
num = sub("^T([0-9]+).*", "\\1", k),
k = sub("^T[0-9]+(.*)", "T\\1", k)
) %>%
spread(k, v)
# # A tibble: 8 x 5
# param num T T_g1 T_g2
# <int> <chr> <dbl> <dbl> <dbl>
# 1 1 1 1 0.1 10
# 2 1 2 5 0.5 50
# 3 50 1 2 0.2 20
# 4 50 2 6 0.6 60
# 5 101 1 3 0.3 30
# 6 101 2 7 0.7 70
# 7 150 1 4 0.4 40
# 8 150 2 8 0.8 80
What we've done is turned four rows with 3*n
columns with the T#
, T#_g1
, and T#_g2
pattern, into just 3 columns but n
times the number of rows. We preserve this n
as another column (for now). This is arguably a good format to work with in general: tidyverse
and notably ggplot2
really likes data in this format, but there is likely more I don't know.
Now the full shebang (repeating the first few lines of code):
x %>%
gather(k, v, -param) %>%
mutate(
num = sub("^T([0-9]+).*", "\\1", k),
k = sub("^T[0-9]+(.*)", "T\\1", k)
) %>%
spread(k, v) %>%
mutate(T = T * if_else(param > 100, T_g1, T_g2)) %>%
gather(k, v, -param, -num) %>%
mutate(k = if_else(grepl("^T", k), paste0("T", num, substr(k, 2, nchar(k))), k)) %>%
select(-num) %>%
spread(k, v)
# # A tibble: 4 x 7
# param T1 T1_g1 T1_g2 T2 T2_g1 T2_g2
# <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 1 10 0.1 10 250 0.5 50
# 2 50 40 0.2 20 360 0.6 60
# 3 101 0.900 0.3 30 4.90 0.7 70
# 4 150 1.6 0.4 40 6.4 0.8 80
after the reshaping, your initial mutate_at
concept is reduced to a single mutate(T = ...)
call. The rest involves re-hydrating the width.
If your data is large, this might be a little cumbersome. Other solutions might involve manually determining the T#
columns and manually doing the ifelse
(outside of mutate
).
Upvotes: 3