Reputation: 544
Let's say I have a dataframe
x y val
A B 5
A C 3
B A 7
B C 9
C A 1
As you can see there are two pairs matching by x
and y
:
Pair 1: A B 5
and B A 7
Pair 2: A C 3
and C A 1
I would like to merge them to A B 12
and A C 4
and leave the B C 9
as it doesn't have a pair (C B
).
The final dataframe should look like this:
x y val
A B 12
A C 4
B C 9
How can I achieve this in R?
Upvotes: 0
Views: 998
Reputation: 20095
One can group by row_number()
to sort
and combine columns in sorted order to create a order independent pair
.
Note: Below solution can be evolve to work for more than 2 columns pairing as well. e.g.treating A B C
, A C B
or B C A
as same group.
library(dplyr)
library(tidyr)
df %>%
group_by(row_number()) %>%
mutate(xy = paste0(sort(c(x,y)),collapse=",")) %>%
group_by(xy) %>%
summarise(val = sum(val)) %>%
separate(xy, c("x","y"))
## A tibble: 3 x 3
# x y val
#* <chr> <chr> <int>
#1 A B 12
#2 A C 4
#3 B C 9
Data:
df <- read.table(text =
"x y val
A B 5
A C 3
B A 7
B C 9
C A 1",
header = TRUE, stringsAsFactors = FALSE)
Upvotes: 0
Reputation: 270170
First ensure that x
and y
are character giving DF_c
and then sort them giving DF_s
. Finally perform the aggregation. No packages are used. The first line would not be needed if x
and y
were already character.
DF_c <- transform(DF, x = as.character(x), y = as.character(y))
DF_s <- transform(DF_c, x = pmin(x, y), y = pmax(x, y))
aggregate(val ~ x + y, DF_s, sum)
giving:
x y val
1 A B 12
2 A C 4
3 B C 9
Upvotes: 2
Reputation: 4768
Here's one solution with dplyr
:
library(dplyr)
df %>%
mutate(var = paste(pmin(x, y), pmax(x, y))) %>%
group_by(var) %>%
summarise(val = sum(val))
# A tibble: 3 x 2 var val <chr> <int> 1 A B 12 2 A C 4 3 B C 9
Add separate(var, c("x", "y"))
to the end of the chain if you want the x
and y
columns as Melissa Key mentions.
Upvotes: 3