Reputation: 6455
I have a data frame that looks like this:
library(dplyr)
df <- data_frame(doc.x = c("a", "b", "c", "d"),
doc.y = c("b", "a", "d", "c"))
So that df
is:
Source: local data frame [4 x 2]
doc.x doc.y
(chr) (chr)
1 a b
2 b a
3 c d
4 d c
This is a list of ordered pairs, a
to d
but also d
to a
, and so on. What is a dplyr-like way to return only a list of unordered pairs in this data frame? I.e.
doc.x doc.y
(chr) (chr)
1 a b
2 c d
Upvotes: 4
Views: 712
Reputation: 6372
Alternate way using data.table
:
df <- data.frame(doc.x = c("a", "b", "c", "d"),
doc.y = c("b", "a", "d", "c"), stringsAsFactors = F)
library(data.table)
setDT(df)
df[, row := 1:nrow(df)]
df <- df[, list(Left = max(doc.x,doc.y),Right = min(doc.x,doc.y)), by = row]
df <- df[, list(Left,Right)]
unique(df)
Left Right
1: b a
2: d c
Upvotes: 2
Reputation: 6203
Using dplyr
# make character columns into factors
df <- as.data.frame(unclass(df))
df$x.lvl <- levels(df$doc.x)
df$y.lvl <- levels(df$doc.y)
# find unique pairs
res <- df %>%
group_by(doc.x) %>%
transform(x.lvl = order(doc.x),
y.lvl = order(doc.y)) %>%
transform(pair = ifelse(x.lvl < y.lvl,
paste(doc.x, doc.y, sep=","), paste(doc.y, doc.x, sep=","))) %>%
.$pair %>%
unique
Unique pairs
res
[1] a,b c,d
Levels: a,b c,d
Edit
Inspired by Backlin's solution, in base R
unique(with(df, paste(pmin(doc.x, doc.y), pmax(doc.x, doc.y), sep=","))
[1] "a,b" "c,d"
Or to store in a data.frame
unique(with(df, data.frame(lvl1=pmin(doc.x, doc.y), lvl2=pmax(doc.x, doc.y))))
lvl1 lvl2
1 a b
3 c d
Upvotes: 1
Reputation: 14852
Use pmin
and pmax
to sort the pairs alphabetically, i.e. turn (b,a) into (a,b) and then filter away all the duplicates.
df %>%
mutate(dx = pmin(doc.x, doc.y), dy = pmax(doc.x, doc.y)) %>%
distinct(dx, dy) %>%
select(-dx, -dy)
doc.x doc.y
(chr) (chr)
1 a b
2 c d
Upvotes: 10