Reputation: 589
library(dplyr)
x1 <- tibble(ID = rep(1,4),
Measure = c("Height","Weight","Width","Length"),
AD1= rpois(4,10),
AD2= rpois(4,9))
x2 <- tibble(ID = rep(2,4),
Measure = c("Height","Weight","Width","Length"),
AD1= rpois(4,20),
AD2= rpois(4,19))
xdiff <- x1 %>%
full_join(x2,by="Measure",
suffix=c("_1","_2")) %>%
mutate(diff1 = AD1_1-AD1_2,
diff2 = AD2_1-AD2_2)
ID_1 Measure AD1_1 AD2_1 ID_2 AD1_2 AD2_2 diff1 diff2
<dbl> <chr> <int> <int> <dbl> <int> <int> <int> <int>
1 1 Height 10 12 2 20 15 -10 -3
2 1 Weight 7 10 2 11 16 -4 -6
3 1 Width 12 11 2 22 17 -10 -6
4 1 Length 11 9 2 24 16 -13 -7
xdesire <- xdiff %>%
select(Measure,AD1_1,AD1_2,diff1,AD2_1,AD2_2,diff2)
Measure AD1_1 AD1_2 diff1 AD2_1 AD2_2 diff2
<chr> <int> <int> <int> <int> <int> <int>
1 Height 10 20 -10 12 15 -3
2 Weight 7 11 -4 10 16 -6
3 Width 12 22 -10 11 17 -6
4 Length 11 24 -13 9 16 -7
full_join(), with default arguments, will add suffixes to the duplicate variables in sequential order of x,x,x,y,y,y. I could technically use select() to order the columns as desired, but if the data have large amount of columns, e.g., AD1:AD200, it would be very inefficient.
dply: order columns alphabetically in R shows method of order the columns alphabetically, but in my case, there is "difference" column that is needed between every variables (AD1,diff1,AD2,diff).
I was wondering if full_join() allows duplicate variables to be alternating, and if I could directly add the calculations with them through join() function.
Of course, other methods, such as more advanced select() argument is welcomed too.
Upvotes: 1
Views: 185
Reputation: 47320
You can use {powerjoin} to nest the conflicting columns together with the created diff column, then unpack all of them:
library(powerjoin)
library(tidyverse)
x1 <- tibble(Measure = c("Height","Weight","Width","Length"),
AD1= rpois(4,10),
AD2= rpois(4,9))
x2 <- tibble(Measure = c("Height","Weight","Width","Length"),
AD1= rpois(4,20),
AD2= rpois(4,19))
power_left_join(
x1, x2, by = "Measure", conflict = ~ tibble("1" = .x, "2" = .y, "diff" = .x - .y)) %>%
unpack(where(is.data.frame), names_sep = "_")
#> # A tibble: 4 × 7
#> Measure AD1_1 AD1_2 AD1_diff AD2_1 AD2_2 AD2_diff
#> <chr> <int> <int> <int> <int> <int> <int>
#> 1 Height 16 27 -11 6 17 -11
#> 2 Weight 7 14 -7 10 21 -11
#> 3 Width 14 21 -7 16 22 -6
#> 4 Length 11 23 -12 6 21 -15
Upvotes: 1
Reputation: 887291
An option is to order
on the first numeric substring extracted with parse_number
on the column names
library(dplyr)
xdiff %>%
select(Measure, order(readr::parse_number(names(.))), -starts_with("ID"))
-output
# A tibble: 4 x 7
# Measure AD1_1 AD1_2 diff1 AD2_1 AD2_2 diff2
# <chr> <int> <int> <int> <int> <int> <int>
#1 Height 10 25 -15 5 20 -15
#2 Weight 16 28 -12 13 15 -2
#3 Width 12 19 -7 9 25 -16
#4 Length 10 29 -19 4 23 -19
Upvotes: 1