coding_heart
coding_heart

Reputation: 1295

Identifying unique duplicates in vector in R

I am trying to identify duplicates based of a match of elements in two vectors. Using duplicate() provides a vector of all matches, however I would like to index which are matches with each other or not. Using the following code as an example:

x <- c(1,6,4,6,4,4)             
y <- c(3,2,5,2,5,5)         

frame <- data.frame(x,y)        
matches <- duplicated(frame) | duplicated(frame, fromLast = TRUE)   
matches
[1] FALSE  TRUE  TRUE  TRUE  TRUE  TRUE

Ultimately, I would like to create a vector that identifies elements 2 and 4 are matches as well as 3,5,6. Any thoughts are greatly appreciated.

Upvotes: 1

Views: 188

Answers (3)

Troy
Troy

Reputation: 8691

How about this with plyr::ddply()

ddply(cbind(index=1:nrow(frame),frame),.(x,y),summarise,count=length(index),elems=paste0(index,collapse=","))

  x y count elems
1 1 3     1     1
2 4 5     3 3,5,6
3 6 2     2   2,4

NB = the expression cbind(index=1:nrow(frame),frame) just adds an element index to each row

Upvotes: 1

Blue Magister
Blue Magister

Reputation: 13363

Another data.table answer, using the group counter .GRP to assign every distinct element a label:

d <- data.table(frame)
d[,z := .GRP, by = list(x,y)]
#    x y z
# 1: 1 3 1
# 2: 6 2 2
# 3: 4 5 3
# 4: 6 2 2
# 5: 4 5 3
# 6: 4 5 3

Upvotes: 4

thelatemail
thelatemail

Reputation: 93813

Using merge against the unique possibilities for each row, you can get a result:

labls <- data.frame(unique(frame),num=1:nrow(unique(frame)))
result <- merge(transform(frame,row = 1:nrow(frame)),labls,by=c("x","y"))
result[order(result$row),]

#  x y row num
#1 1 3   1   1
#5 6 2   2   2
#2 4 5   3   3
#6 6 2   4   2
#3 4 5   5   3
#4 4 5   6   3

The result$num vector gives the groups.

Upvotes: 1

Related Questions