aiorr
aiorr

Reputation: 589

Ways to full_join with alternating "non-joined duplicate variables" instead of .x,.x,.x,.y,.y,.y format, and corresponding calculation

library(dplyr)
x1 <- tibble(ID = rep(1,4),
             Measure = c("Height","Weight","Width","Length"),
             AD1= rpois(4,10),
             AD2= rpois(4,9))
x2 <- tibble(ID = rep(2,4),
             Measure = c("Height","Weight","Width","Length"),
             AD1= rpois(4,20),
             AD2= rpois(4,19))

xdiff <- x1 %>%
  full_join(x2,by="Measure",
            suffix=c("_1","_2")) %>%
  mutate(diff1 = AD1_1-AD1_2,
         diff2 = AD2_1-AD2_2)

   ID_1 Measure AD1_1 AD2_1  ID_2 AD1_2 AD2_2 diff1 diff2
  <dbl> <chr>   <int> <int> <dbl> <int> <int> <int> <int>
1     1 Height     10    12     2    20    15   -10    -3
2     1 Weight      7    10     2    11    16    -4    -6
3     1 Width      12    11     2    22    17   -10    -6
4     1 Length     11     9     2    24    16   -13    -7

xdesire <- xdiff %>%
  select(Measure,AD1_1,AD1_2,diff1,AD2_1,AD2_2,diff2)

  Measure AD1_1 AD1_2 diff1 AD2_1 AD2_2 diff2
  <chr>   <int> <int> <int> <int> <int> <int>
1 Height     10    20   -10    12    15    -3
2 Weight      7    11    -4    10    16    -6
3 Width      12    22   -10    11    17    -6
4 Length     11    24   -13     9    16    -7

full_join(), with default arguments, will add suffixes to the duplicate variables in sequential order of x,x,x,y,y,y. I could technically use select() to order the columns as desired, but if the data have large amount of columns, e.g., AD1:AD200, it would be very inefficient.

dply: order columns alphabetically in R shows method of order the columns alphabetically, but in my case, there is "difference" column that is needed between every variables (AD1,diff1,AD2,diff).

I was wondering if full_join() allows duplicate variables to be alternating, and if I could directly add the calculations with them through join() function.

Of course, other methods, such as more advanced select() argument is welcomed too.

Upvotes: 1

Views: 185

Answers (2)

moodymudskipper
moodymudskipper

Reputation: 47320

You can use {powerjoin} to nest the conflicting columns together with the created diff column, then unpack all of them:

library(powerjoin)
library(tidyverse)
x1 <- tibble(Measure = c("Height","Weight","Width","Length"),
             AD1= rpois(4,10),
             AD2= rpois(4,9))
x2 <- tibble(Measure = c("Height","Weight","Width","Length"),
             AD1= rpois(4,20),
             AD2= rpois(4,19))

power_left_join(
  x1, x2, by = "Measure", conflict = ~ tibble("1" = .x, "2" = .y, "diff" = .x - .y)) %>% 
  unpack(where(is.data.frame), names_sep = "_")
#> # A tibble: 4 × 7
#>   Measure AD1_1 AD1_2 AD1_diff AD2_1 AD2_2 AD2_diff
#>   <chr>   <int> <int>    <int> <int> <int>    <int>
#> 1 Height     16    27      -11     6    17      -11
#> 2 Weight      7    14       -7    10    21      -11
#> 3 Width      14    21       -7    16    22       -6
#> 4 Length     11    23      -12     6    21      -15

Upvotes: 1

akrun
akrun

Reputation: 887291

An option is to order on the first numeric substring extracted with parse_number on the column names

library(dplyr)
xdiff %>%
  select(Measure, order(readr::parse_number(names(.))),  -starts_with("ID"))

-output

# A tibble: 4 x 7
#  Measure AD1_1 AD1_2 diff1 AD2_1 AD2_2 diff2
#  <chr>   <int> <int> <int> <int> <int> <int>
#1 Height     10    25   -15     5    20   -15
#2 Weight     16    28   -12    13    15    -2
#3 Width      12    19    -7     9    25   -16
#4 Length     10    29   -19     4    23   -19

Upvotes: 1

Related Questions