Reputation: 2166
I have a data.frame
with two variables id.x
and id.y
whose combination uniquely identifies each row but are repeated many times in the dataset.
I would like to use dplyr
to group_by
id.x
such that each id.x
is matched with a distinct id.y
.
edit edited example to highlight the differing number of unique
id.x.
and id.y
An example:
id.x id.y
a o
a p
a q
c o
c p
c q
Would return:
id.x id.y
a o
c q
dput for example:
structure(list(id.x = structure(c(1L, 1L, 1L, 2L, 2L, 2L), .Label = c("a",
"c"), class = "factor"), id.y = structure(c(1L, 2L, 3L, 1L, 2L,
3L), .Label = c("o", "p", "q"), class = "factor")), .Names = c("id.x",
"id.y"), row.names = c(NA, -6L), class = "data.frame")
edit If my desired result could be accomplished without the use of group_by
or distinct
that is fine too! I also use data.table
, and a data.table
solution would be fine.
Upvotes: 9
Views: 326
Reputation: 4367
Using dplyr
df %>% filter(dense_rank(id.x)==dense_rank(id.y))
which returns
id.x id.y
1 a o
2 c p
Upvotes: 1