Reputation: 11
I need to identfy pairs of numbers in a table in R. The structure of the table is as follows:
user_A user_B counter
1 1 2 5
2 1 3 3
3 2 1 10
4 2 4 8
I want to check if for each pair, an opposite one exists, e.g. pair (1,2) and pair (2,1), and if so - sum the counter value for both pairs. As a result, I want to have an output in a form of a table like this one:
user_A user_B sum bi_directional
1 1 2 15 TRUE
2 1 3 3 FALSE
3 2 4 8 FALSE
Thank you in advance!
Upvotes: 0
Views: 268
Reputation: 15917
This is a dplyr solution:
df %>%
mutate(user_A2 = pmin(user_A, user_B),
user_B = pmax(user_A, user_B),
user_A = user_A2) %>%
select(-user_A2) %>%
group_by(user_A, user_B) %>%
summarise(sum = sum(counter), bi_directional = n() > 1) %>%
as.data.frame
## user_A user_B sum bi_directional
## 1 1 2 15 TRUE
## 2 1 3 3 FALSE
## 3 2 4 8 FALSE
mutate()
is used to redefine user_A
and user_B
such that the smaller of the two values is always in the first column. Then the auxiliary column user_A2
is dropped. The data is grouped by user_A
and user_B
and two summaries are calculated for each group: The sum over counter
and whether there was more than one value. This latter uses n()
which counts the number of rows in the group.
The last line uses as.data.frame
to transform back to a data frame. This is only needed, if you insist on having a data frame as your result.
Upvotes: 0
Reputation: 887431
We can sort
the first two column row wise with apply
(MARGIN=1), cbind
with the third column ('d1'), get an index of duplicates of the first two column ('i1'). Convert to 'data.table' (setDT(d2)
), grouped by 'user_A', and 'user_B', get the sum
of 'Counter' and the first row of 'i1'.
d1 <- setNames(cbind(t(apply(df[1:2], 1, sort)), df[3]), names(df))
i1 <- duplicated(d1[1:2])|duplicated(d1[1:2], fromLast=TRUE)
d2 <- cbind(d1, i1)
library(data.table)
setDT(d2)[, list(counter=sum(counter), bi_directional=i1[1L]) ,.(user_A, user_B)]
# user_A user_B counter bi_directional
#1: 1 2 15 TRUE
#2: 1 3 3 FALSE
#3: 2 4 8 FALSE
Or another option is
setDT(df)[user_A > user_B, c('user_B', 'user_A') :=
list(user_A, user_B)]
df[, list(counter= sum(counter), bi_directional= .N>1),
by = .(user_A, user_B)]
# user_A user_B counter bi_directional
#1: 1 2 15 TRUE
#2: 1 3 3 FALSE
#3: 2 4 8 FALSE
Upvotes: 1