G. Rocher
G. Rocher

Reputation: 63

How to mutate multiple columns following name pattern?

I have a dataset with multiple columns that follow a name pattern, and I need to calculate new columns that is the product of two other columns. I am looking for a tidyverse option, but I would want to avoid to do a pivot_longer as the dataset has >million rows.

Example dataset

library(dplyr)

df <- tibble(
  jan_mean = runif(10),
  feb_mean = runif(10),
  mar_mean = runif(10),
  jan_sd = runif(10),
  feb_sd = runif(10),
  mar_sd = runif(10),
)

I can do it manually like this:

df2 <- df %>% 
  mutate(jan_cv= jan_mean/jan_sd,
         feb_cv= feb_mean/feb_sd,
         mar_cv= mar_mean/mar_sd
         )

This is a simple example, but I have similar operations for monthly values.

EDIT 1

I need to do this for large datasets and I was worried that pivot_longer would be quite consuming, so I did a quick comparison of the three methods.

Method 1 is the manual way, Method 2 is the short version suggested by @Tarjae, and Method 3 is using pivot longer:


tic("Method 1: manual option")
df2 <- df %>% 
  mutate(jan_cv= jan_mean/jan_sd,
         feb_cv= feb_mean/feb_sd,
         mar_cv= mar_mean/mar_sd
  )
toc()

tic("Method 2: Short option")
df2 <- df %>%
  mutate(across(ends_with('_mean'), ~ . /
                  get(str_replace(cur_column(), "mean$", "sd")), .names = "{.col}_cv")) %>%
  rename_at(vars(ends_with('cv')), ~ str_remove(., "\\_mean"))
toc()

tic("Method 3: pivot wider option")
df2 <- df %>% 
  mutate(id = row_number()) %>% 
  pivot_longer(-id, names_to = c("month", ".value"), names_sep = "_") %>% 
  mutate(cv = mean / sd) %>% 
  pivot_wider(names_from = "month", values_from = c(mean, sd, cv), names_glue = "{month}_{.value}") %>% 
  select(-id)
toc()



The results are:

Method 1: manual option: 0.05 sec elapsed
Method 2: Short option: 0.01 sec elapsed
Method 3: pivot wider option: 0.19 sec elapsed

So method 2 is even faster than manually doing each column

Upvotes: 5

Views: 1316

Answers (2)

TarJae
TarJae

Reputation: 78917

We could use across in this situation with some string manipulation with stringr:

library(dplyr)
library(stringr)
df %>%
  mutate(across(ends_with('_mean'), ~ . /
                  get(str_replace(cur_column(), "mean$", "sd")), .names = "{.col}_cv")) %>%
  rename_at(vars(ends_with('cv')), ~ str_remove(., "\\_mean"))
   jan_mean feb_mean mar_mean jan_sd feb_sd mar_sd jan_cv feb_cv mar_cv
      <dbl>    <dbl>    <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
 1   0.838     0.401   0.131  0.329  0.0292  0.911  2.55  13.7    0.144
 2   0.595     0.173   0.0935 0.313  0.105   0.247  1.90   1.64   0.378
 3   0.0546    0.934   0.983  0.536  0.618   0.292  0.102  1.51   3.36 
 4   0.543     0.802   0.569  0.585  0.901   0.742  0.928  0.891  0.766
 5   0.899     0.761   0.245  0.932  0.506   0.526  0.965  1.50   0.466
 6   0.832     0.875   0.947  0.390  0.613   0.607  2.13   1.43   1.56 
 7   0.268     0.421   0.930  0.869  0.873   0.612  0.308  0.483  1.52 
 8   0.475     0.217   0.330  0.0473 0.826   0.903 10.0    0.262  0.366
 9   0.379     0.425   0.479  0.931  0.381   0.223  0.407  1.12   2.15 
10   0.616     0.922   0.707  0.976  0.241   0.619  0.631  3.82   1.14 

Upvotes: 6

stefan
stefan

Reputation: 123873

One option to achieve your desired result would be to convert your data to long format which makes it easy to do the computations per month and if desired convert back to wide format afterwards. To this end I first added an identifier column to your data:

library(dplyr)
library(tidyr)

set.seed(42)

df %>% 
  mutate(id = row_number()) %>% 
  pivot_longer(-id, names_to = c("month", ".value"), names_sep = "_") %>% 
  mutate(cv = mean / sd) %>% 
  pivot_wider(names_from = "month", values_from = c(mean, sd, cv), names_glue = "{month}_{.value}") %>% 
  select(-id)
#> # A tibble: 10 × 9
#>    jan_mean feb_mean mar_mean  jan_sd feb_sd mar_sd  jan_cv feb_cv mar_cv
#>       <dbl>    <dbl>    <dbl>   <dbl>  <dbl>  <dbl>   <dbl>  <dbl>  <dbl>
#>  1    0.915    0.458   0.904  0.738   0.380  0.333    1.24   1.21   2.71 
#>  2    0.937    0.719   0.139  0.811   0.436  0.347    1.16   1.65   0.400
#>  3    0.286    0.935   0.989  0.388   0.0374 0.398    0.737 25.0    2.48 
#>  4    0.830    0.255   0.947  0.685   0.974  0.785    1.21   0.262  1.21 
#>  5    0.642    0.462   0.0824 0.00395 0.432  0.0389 163.     1.07   2.12 
#>  6    0.519    0.940   0.514  0.833   0.958  0.749    0.623  0.982  0.687
#>  7    0.737    0.978   0.390  0.00733 0.888  0.677  100.     1.10   0.576
#>  8    0.135    0.117   0.906  0.208   0.640  0.171    0.648  0.184  5.29 
#>  9    0.657    0.475   0.447  0.907   0.971  0.261    0.725  0.489  1.71 
#> 10    0.705    0.560   0.836  0.612   0.619  0.514    1.15   0.905  1.63

Upvotes: 1

Related Questions