887
887

Reputation: 619

How to subtract multiple similarly labeled columns from one another?

I have a data set resembling the following:

enter image description here

I want to subtract the value of all Tuesdays from corresponding values of Mondays. So the resulting first column would be Monday_1 - Tuesday_1, second resulting column would be Monday_2 - Tuesday_2, and so on.

Here is what the resulting data frame would look like.

enter image description here

I would like to keep the names of the Monday columns the same and select out the Tuesday cols. My real data set is much wider than this so not having to manually subtract is much easier. Ideally I want a dplyr solution.

Data:

df <- structure(list(id = 1:7, Monday_1 = c(4L, 11L, 18L, 6L, 20L, 
5L, 12L), Monday_2 = c(20L, 3L, 20L, 12L, 1L, 10L, 15L), Monday_3 = c(14L, 
20L, 8L, 17L, 4L, 2L, 3L), Monday_4 = c(13L, 8L, 11L, 3L, 12L, 
14L, 17L), Tuesday_1 = c(1L, 14L, 7L, 16L, 2L, 6L, 12L), Tuesday_2 = c(10L, 
8L, 1L, 16L, 10L, 13L, 9L), Tuesday_3 = c(4L, 9L, 9L, 8L, 7L, 
9L, 12L), Tuesday_4 = c(12L, 18L, 3L, 18L, 6L, 11L, 8L)), class = "data.frame", row.names = c(NA, 
-7L))

Upvotes: 5

Views: 610

Answers (4)

Andre Wildberg
Andre Wildberg

Reputation: 19271

Using dplyrs summarize

EDIT: Use reframe since 1.1.0

library(dplyr)

df %>% 
  reframe(id, across(starts_with("Monday")) - across(starts_with("Tuesday")))
  id Monday_1 Monday_2 Monday_3 Monday_4
1  1        3       10       10        1
2  2       -3       -5       11      -10
3  3       11       19       -1        8
4  4      -10       -4        9      -15
5  5       18       -9       -3        6
6  6       -1       -3       -7        3
7  7        0        6       -9        9

Upvotes: 2

TarJae
TarJae

Reputation: 79286

Thx to @TimTeaFan input: improved code: using .keep="unsued removing the selectrow:

library(dplyr)
library(stringr)
df %>% 
  mutate(across(starts_with('Monday_'),
                ~ . - get(str_replace(cur_column(), 'Monday_', 'Tuesday_')), 
                .names = "{str_replace(.col, fixed('Monday'), 'diff')}")) %>% 
  select(contains(c("id", "diff")))

Here is an alternative tidyverse approach:

library(dplyr)
library(stringr)
df %>% 
  mutate(across(starts_with('Monday_'),
                ~ . - get(str_replace(cur_column(), 'Monday_', 'Tuesday_')), 
                .names = "{str_replace(.col, fixed('Monday'), 'diff')}")) %>% 
  select(contains(c("id", "diff")))
  id diff_1 diff_2 diff_3 diff_4
1  1      3     10     10      1
2  2     -3     -5     11    -10
3  3     11     19     -1      8
4  4    -10     -4      9    -15
5  5     18     -9     -3      6
6  6     -1     -3     -7      3
7  7      0      6     -9      9

@TimTeaFan has a package dplyover which contains across2 that should also work, but I can't implement it:

library(dplyr)
library(dplyover) # https://github.com/TimTeaFan/dplyover


df %>%
  mutate(across2(starts_with("Monday_"),
                 starts_with("Tuesday_"),
                 ~ .x - .y,
                 .names = "{diff}"))

#error:

Error in `mutate()`:
! Problem while computing `..1 = across2(...)`.
Caused by error in `across2_setup()`:
! Problem with `across2()`  input `.names`.
✖ Unrecognized glue specification `{...}` detected in `.names`.
ℹ `.names` only supports the following expressions: '{xcol}', '{ycol}', '{idx}' or '{fn}'.

Upvotes: 1

TimTeaFan
TimTeaFan

Reputation: 18581

Thanks @TarJae for pointing me to this.

We can use dplyover::across2(). If you want to keep the names Monday_1 etc. then we need to set the .names argument to {xcol}, in this case the values of the Monday_1 etc. columns get overwritten.

Further, if we want to keep the id column and drop all other "used" columns, we need to use transmute(id = id, across2(...)), since dplyover doesn't support yet the .keep argument in dplyr::mutate.

library(dplyr)
library(dplyover)

df %>%
  transmute(id = id,
            across2(starts_with("Monday_"),
                 starts_with("Tuesday_"),
                 ~ .x - .y,
                 .names = "{xcol}"))
#>   id Monday_1 Monday_2 Monday_3 Monday_4
#> 1  1        3       10       10        1
#> 2  2       -3       -5       11      -10
#> 3  3       11       19       -1        8
#> 4  4      -10       -4        9      -15
#> 5  5       18       -9       -3        6
#> 6  6       -1       -3       -7        3
#> 7  7        0        6       -9        9

If we want to create new names like diff_1 etc. the trick is to use set the .names argument to "diff_{suf}" which says, "use the string "diff_" and append the common suffix of each variable pair {suf}.

With this the output names will be diff_1, diff_2 etc.

df %>%
  transmute(id = id,
            across2(starts_with("Monday_"),
                    starts_with("Tuesday_"),
                     ~ .x - .y,
                    .names = "diff_{suf}"))

#>   id diff_1 diff_2 diff_3 diff_4
#> 1  1      3     10     10      1
#> 2  2     -3     -5     11    -10
#> 3  3     11     19     -1      8
#> 4  4    -10     -4      9    -15
#> 5  5     18     -9     -3      6
#> 6  6     -1     -3     -7      3
#> 7  7      0      6     -9      9

Created on 2023-01-22 with reprex v2.0.2

Upvotes: 2

mnist
mnist

Reputation: 6954

using across gives you this easily:

df |> 
  mutate(across(.cols = starts_with("Monday")) - across(.cols = starts_with("Tuesday"))) |> 
  select(id, starts_with("Monday"))
#>   id Monday_1 Monday_2 Monday_3 Monday_4
#> 1  1        3       10       10        1
#> 2  2       -3       -5       11      -10
#> 3  3       11       19       -1        8
#> 4  4      -10       -4        9      -15
#> 5  5       18       -9       -3        6
#> 6  6       -1       -3       -7        3
#> 7  7        0        6       -9        9

Upvotes: 6

Related Questions