user49017
user49017

Reputation: 137

Pairwise Difference Between Pairs of Observations within an R data frame

I have a two data frames. One data frame identifies pairs of observations where pairs are unique but an element can be part of more than one pair (See below).

PairID  PairElement1      PairElement2
1       A                 B
2       C                 D
3       E                 F
4       A                 C
5       B                 D
6       G                 H

The second data frame has data on the elements that make up the pairs (see below).

PairElement    var1     var2    var3    var4
A               8        4       3       8
B               8        8       7       6
C               10       1       0       1
D               1        2       3       10
E               10       10      6       4
F               5        6       8       6
G               1        2       6       5
H               8        0       9       0

My goal is to calculate the pairwise difference for each variable such that, for each pair, observations represent the difference between the first and the second pair elements (see below).

PairID  Pair1   Pair2   var1d   var2d   var3d   var4d
1       A       B       0        -4     -4      2
2       C       D       9        -1     -3      -9
3       E       F       5        4      -2      -2
4       A       C       -2       3       3      7
5       B       D       7        6       4      -4
6       G       H       -7       2      -3      5

I tried merging the two or splitting the data frame by pair element, but I couldn't come up with a solution that works consistently.

Upvotes: 2

Views: 586

Answers (1)

thelatemail
thelatemail

Reputation: 93908

Reshape to long so that you can merge on PairElement as a single column, do the merge, order it in reverse, get the difference within each PairID:

tmp <- merge(
  reshape(one, idvar="PairID", sep="", varying=-1, direction="long"),
  two,
  by = "PairElement"
)
tmp <- tmp[order(tmp$PairID, -tmp$time),]
aggregate(cbind(var1,var2,var3,var4) ~ PairID, data = tmp, FUN=diff)

#  PairID var1 var2 var3 var4
#1      1    0   -4   -4    2
#2      2    9   -1   -3   -9
#3      3    5    4   -2   -2
#4      4   -2    3    3    7
#5      5    7    6    4   -4
#6      6   -7    2   -3    5

In dplyr/tidyr speak, something like:

library(dplyr)
library(tidyr)

one %>%
  pivot_longer(-PairID, values_to="PairElement") %>%
  right_join(two, by="PairElement") %>%
  group_by(PairID) %>%
  arrange(desc(name)) %>%
  select(-name, -PairElement) %>% 
  summarise_all(diff)

Upvotes: 2

Related Questions