Reputation: 137
I have a two data frames. One data frame identifies pairs of observations where pairs are unique but an element can be part of more than one pair (See below).
PairID PairElement1 PairElement2
1 A B
2 C D
3 E F
4 A C
5 B D
6 G H
The second data frame has data on the elements that make up the pairs (see below).
PairElement var1 var2 var3 var4
A 8 4 3 8
B 8 8 7 6
C 10 1 0 1
D 1 2 3 10
E 10 10 6 4
F 5 6 8 6
G 1 2 6 5
H 8 0 9 0
My goal is to calculate the pairwise difference for each variable such that, for each pair, observations represent the difference between the first and the second pair elements (see below).
PairID Pair1 Pair2 var1d var2d var3d var4d
1 A B 0 -4 -4 2
2 C D 9 -1 -3 -9
3 E F 5 4 -2 -2
4 A C -2 3 3 7
5 B D 7 6 4 -4
6 G H -7 2 -3 5
I tried merging the two or splitting the data frame by pair element, but I couldn't come up with a solution that works consistently.
Upvotes: 2
Views: 586
Reputation: 93908
Reshape to long so that you can merge on PairElement
as a single column, do the merge, order it in reverse, get the diff
erence within each PairID
:
tmp <- merge(
reshape(one, idvar="PairID", sep="", varying=-1, direction="long"),
two,
by = "PairElement"
)
tmp <- tmp[order(tmp$PairID, -tmp$time),]
aggregate(cbind(var1,var2,var3,var4) ~ PairID, data = tmp, FUN=diff)
# PairID var1 var2 var3 var4
#1 1 0 -4 -4 2
#2 2 9 -1 -3 -9
#3 3 5 4 -2 -2
#4 4 -2 3 3 7
#5 5 7 6 4 -4
#6 6 -7 2 -3 5
In dplyr/tidyr speak, something like:
library(dplyr)
library(tidyr)
one %>%
pivot_longer(-PairID, values_to="PairElement") %>%
right_join(two, by="PairElement") %>%
group_by(PairID) %>%
arrange(desc(name)) %>%
select(-name, -PairElement) %>%
summarise_all(diff)
Upvotes: 2