Reputation: 619
I have a data set resembling the following:
I want to subtract the value of all Tuesdays from corresponding values of Mondays. So the resulting first column would be Monday_1 - Tuesday_1, second resulting column would be Monday_2 - Tuesday_2, and so on.
Here is what the resulting data frame would look like.
I would like to keep the names of the Monday columns the same and select out the Tuesday cols. My real data set is much wider than this so not having to manually subtract is much easier. Ideally I want a dplyr
solution.
Data:
df <- structure(list(id = 1:7, Monday_1 = c(4L, 11L, 18L, 6L, 20L,
5L, 12L), Monday_2 = c(20L, 3L, 20L, 12L, 1L, 10L, 15L), Monday_3 = c(14L,
20L, 8L, 17L, 4L, 2L, 3L), Monday_4 = c(13L, 8L, 11L, 3L, 12L,
14L, 17L), Tuesday_1 = c(1L, 14L, 7L, 16L, 2L, 6L, 12L), Tuesday_2 = c(10L,
8L, 1L, 16L, 10L, 13L, 9L), Tuesday_3 = c(4L, 9L, 9L, 8L, 7L,
9L, 12L), Tuesday_4 = c(12L, 18L, 3L, 18L, 6L, 11L, 8L)), class = "data.frame", row.names = c(NA,
-7L))
Upvotes: 5
Views: 610
Reputation: 19271
Using dplyr
s summarize
EDIT: Use reframe
since 1.1.0
library(dplyr)
df %>%
reframe(id, across(starts_with("Monday")) - across(starts_with("Tuesday")))
id Monday_1 Monday_2 Monday_3 Monday_4
1 1 3 10 10 1
2 2 -3 -5 11 -10
3 3 11 19 -1 8
4 4 -10 -4 9 -15
5 5 18 -9 -3 6
6 6 -1 -3 -7 3
7 7 0 6 -9 9
Upvotes: 2
Reputation: 79286
Thx to @TimTeaFan input:
improved code: using .keep="unsued
removing the select
row:
library(dplyr)
library(stringr)
df %>%
mutate(across(starts_with('Monday_'),
~ . - get(str_replace(cur_column(), 'Monday_', 'Tuesday_')),
.names = "{str_replace(.col, fixed('Monday'), 'diff')}")) %>%
select(contains(c("id", "diff")))
Here is an alternative tidyverse
approach:
library(dplyr)
library(stringr)
df %>%
mutate(across(starts_with('Monday_'),
~ . - get(str_replace(cur_column(), 'Monday_', 'Tuesday_')),
.names = "{str_replace(.col, fixed('Monday'), 'diff')}")) %>%
select(contains(c("id", "diff")))
id diff_1 diff_2 diff_3 diff_4
1 1 3 10 10 1
2 2 -3 -5 11 -10
3 3 11 19 -1 8
4 4 -10 -4 9 -15
5 5 18 -9 -3 6
6 6 -1 -3 -7 3
7 7 0 6 -9 9
@TimTeaFan has a package dplyover
which contains across2
that should also work, but I can't implement it:
library(dplyr)
library(dplyover) # https://github.com/TimTeaFan/dplyover
df %>%
mutate(across2(starts_with("Monday_"),
starts_with("Tuesday_"),
~ .x - .y,
.names = "{diff}"))
#error:
Error in `mutate()`:
! Problem while computing `..1 = across2(...)`.
Caused by error in `across2_setup()`:
! Problem with `across2()` input `.names`.
✖ Unrecognized glue specification `{...}` detected in `.names`.
ℹ `.names` only supports the following expressions: '{xcol}', '{ycol}', '{idx}' or '{fn}'.
Upvotes: 1
Reputation: 18581
Thanks @TarJae for pointing me to this.
We can use dplyover::across2()
. If you want to keep the names Monday_1
etc. then we need to set the .names
argument to {xcol}
, in this case the values of the Monday_1
etc. columns get overwritten.
Further, if we want to keep the id
column and drop all other "used" columns, we need to use transmute(id = id, across2(...))
, since dplyover doesn't support yet the .keep
argument in dplyr::mutate
.
library(dplyr)
library(dplyover)
df %>%
transmute(id = id,
across2(starts_with("Monday_"),
starts_with("Tuesday_"),
~ .x - .y,
.names = "{xcol}"))
#> id Monday_1 Monday_2 Monday_3 Monday_4
#> 1 1 3 10 10 1
#> 2 2 -3 -5 11 -10
#> 3 3 11 19 -1 8
#> 4 4 -10 -4 9 -15
#> 5 5 18 -9 -3 6
#> 6 6 -1 -3 -7 3
#> 7 7 0 6 -9 9
If we want to create new names like diff_1
etc. the trick is to use set the .names
argument to "diff_{suf}
" which says, "use the string "diff_"
and append the common suffix of each variable pair {suf}
.
With this the output names will be diff_1
, diff_2
etc.
df %>%
transmute(id = id,
across2(starts_with("Monday_"),
starts_with("Tuesday_"),
~ .x - .y,
.names = "diff_{suf}"))
#> id diff_1 diff_2 diff_3 diff_4
#> 1 1 3 10 10 1
#> 2 2 -3 -5 11 -10
#> 3 3 11 19 -1 8
#> 4 4 -10 -4 9 -15
#> 5 5 18 -9 -3 6
#> 6 6 -1 -3 -7 3
#> 7 7 0 6 -9 9
Created on 2023-01-22 with reprex v2.0.2
Upvotes: 2
Reputation: 6954
using across
gives you this easily:
df |>
mutate(across(.cols = starts_with("Monday")) - across(.cols = starts_with("Tuesday"))) |>
select(id, starts_with("Monday"))
#> id Monday_1 Monday_2 Monday_3 Monday_4
#> 1 1 3 10 10 1
#> 2 2 -3 -5 11 -10
#> 3 3 11 19 -1 8
#> 4 4 -10 -4 9 -15
#> 5 5 18 -9 -3 6
#> 6 6 -1 -3 -7 3
#> 7 7 0 6 -9 9
Upvotes: 6