Reputation:
My dataframe is much bigger than this one.
But the idea that I want is to
x = data.frame(
A= c(3, 4, 5, 7,9),
B= c(7, 8, 9, 3,5),
C= c(11, 12, 13, 14,18)
)
I am considering the rows 1 and 4 the same because for me the pair (3,7) and (7,3) are the same (the pairs (5,9) and (9,5) too). Whith this criteria I would like to leave only one pair.
The result should be this:
x = data.frame(
A= c(3, 4, 5),
B= c(7, 8, 9),
C= c(11, 12, 13)
)
How can I do this?
Is it possible to do this with the function subset
?
Upvotes: 4
Views: 83
Reputation: 887058
Here is a base R
option with pmin/pmax
and duplicated
x[!duplicated(with(x, pmin(A, B), pmax(A, B))),]
A B C
#1 3 7 11
#2 4 8 12
#3 5 9 13
Upvotes: 1
Reputation: 4970
A base
R solution. Use ifelse
and paste0
to create a variable that combines A
and B
, and puts the smallest value first. Then you can use duplicated
to identify duplicate values, and subset.
index <- ifelse(x$A<x$B, paste0(x$A, '-', x$B), paste0(x$B, '-', x$A))
index
[1] "3-7" "4-8" "5-9" "3-7" "5-9"
x[!duplicated(index),]
A B C
1 3 7 11
2 4 8 12
3 5 9 13
Since you mention subset
. It does the same as []
.
subset(x, !duplicated(index))
A B C
1 3 7 11
2 4 8 12
3 5 9 13
Upvotes: 1
Reputation: 2856
library(dplyr)
x <- x %>%
group_by(A, B) %>%
mutate(AB = paste0(min(A, B), max(A, B)))
x[!duplicated(x$AB), -4]
# # A tibble: 3 x 3
# # Groups: A, B [3]
# A B C
# <dbl> <dbl> <dbl>
# 1 3 7 11
# 2 4 8 12
# 3 5 9 13
Upvotes: 2