Karol Daniluk
Karol Daniluk

Reputation: 544

How to match pairs and merge data frame by values in two columns?

Let's say I have a dataframe

x    y    val
A    B    5
A    C    3
B    A    7
B    C    9
C    A    1

As you can see there are two pairs matching by x and y:

Pair 1: A B 5 and B A 7

Pair 2: A C 3 and C A 1

I would like to merge them to A B 12 and A C 4 and leave the B C 9 as it doesn't have a pair (C B).

The final dataframe should look like this:

x    y    val
A    B    12
A    C    4
B    C    9

How can I achieve this in R?

Upvotes: 0

Views: 998

Answers (3)

MKR
MKR

Reputation: 20095

One can group by row_number() to sort and combine columns in sorted order to create a order independent pair.

Note: Below solution can be evolve to work for more than 2 columns pairing as well. e.g.treating A B C, A C B or B C A as same group.

library(dplyr)
library(tidyr)
df %>%
  group_by(row_number()) %>%
  mutate(xy = paste0(sort(c(x,y)),collapse=",")) %>%
  group_by(xy) %>%
  summarise(val = sum(val)) %>% 
  separate(xy, c("x","y"))

## A tibble: 3 x 3
#  x     y       val
#* <chr> <chr> <int>
#1 A     B        12
#2 A     C         4
#3 B     C         9

Data:

df <- read.table(text = 
"x    y    val
A    B    5
A    C    3
B    A    7
B    C    9
C    A    1",
header = TRUE, stringsAsFactors = FALSE)

Upvotes: 0

G. Grothendieck
G. Grothendieck

Reputation: 270170

First ensure that x and y are character giving DF_c and then sort them giving DF_s. Finally perform the aggregation. No packages are used. The first line would not be needed if x and y were already character.

DF_c <- transform(DF, x = as.character(x), y = as.character(y))
DF_s <- transform(DF_c, x = pmin(x, y), y = pmax(x, y))
aggregate(val ~ x + y, DF_s, sum)

giving:

  x y val
1 A B  12
2 A C   4
3 B C   9

Upvotes: 2

tyluRp
tyluRp

Reputation: 4768

Here's one solution with dplyr:

library(dplyr)

df %>% 
  mutate(var = paste(pmin(x, y), pmax(x, y))) %>% 
  group_by(var) %>% 
  summarise(val = sum(val))
# A tibble: 3 x 2
  var     val
  <chr> <int>
1 A B      12
2 A C       4
3 B C       9

Add separate(var, c("x", "y")) to the end of the chain if you want the x and y columns as Melissa Key mentions.

Upvotes: 3

Related Questions