bjoseph
bjoseph

Reputation: 2166

Combine group_by and distinct

I have a data.frame with two variables id.x and id.y whose combination uniquely identifies each row but are repeated many times in the dataset.

I would like to use dplyr to group_by id.x such that each id.x is matched with a distinct id.y.

edit edited example to highlight the differing number of unique id.x. and id.y

An example:

  id.x id.y
    a    o
    a    p
    a    q
    c    o
    c    p
    c    q

Would return:

 id.x id.y
    a    o
    c    q

dput for example:

structure(list(id.x = structure(c(1L, 1L, 1L, 2L, 2L, 2L), .Label = c("a", 
"c"), class = "factor"), id.y = structure(c(1L, 2L, 3L, 1L, 2L, 
3L), .Label = c("o", "p", "q"), class = "factor")), .Names = c("id.x", 
"id.y"), row.names = c(NA, -6L), class = "data.frame")

edit If my desired result could be accomplished without the use of group_by or distinct that is fine too! I also use data.table, and a data.table solution would be fine.

Upvotes: 9

Views: 326

Answers (1)

manotheshark
manotheshark

Reputation: 4367

Using dplyr

df %>% filter(dense_rank(id.x)==dense_rank(id.y))

which returns

  id.x id.y
1    a    o
2    c    p

Upvotes: 1

Related Questions