citronrose
citronrose

Reputation: 409

Find non-first match in R

I know that match(x,y) returns the first match of all elements of x in y.

Assuming that x may contain the same value multiple time, I am looking for a concise way to match the nth occurrence in x with the nth occurrence in y.

For example: `

x <- c(3,4,4,3,2,4)
y <- c(1,2,3,4,1,2,3,4)

my.match(x, y)
## 3,4,8,7,2,NA

Upvotes: 3

Views: 70

Answers (2)

citronrose
citronrose

Reputation: 409

The following function is much faster when vectors are large because it does not iterate over the whole vector

my.match <- function(x,y){
  fidx <- rep(FALSE,length(x))
  fidy <- rep(FALSE,length(y))
  ret <- rep(NA,length(x))
  repeat{
    nidx <- which(!fidx)
    nidy <- which(!fidy) 
    idx <- match(x[nidx],y[nidy]) 
    idy <- match(y[nidy],x[nidx]) 
    ret[nidx] <- nidy[idx]
    fidx[nidx[unique(idy)]] <- TRUE
    fidy[nidy[unique(idx)]] <- TRUE
    if(sum(!is.na(idx))==0 | sum(!is.na(idy))==0){
      break
    }
  }
  return(ret)    
} 

Benchmarking with the other proposed method yields:

my.match1 <- function(x,y){
  idx <- c()
  for (i in x) {
    k <- match(i, y)
    idx <- c(idx, k)
    y[k] <- NA
  }
  return(idx)
 }
x <- sample.int(100,10000,replace=T)
y <- sample.int(100,10000,replace=T)
system.time(my.match1(x,y))
##  user  system elapsed 
## 1.016   0.003   1.020 
system.time(my.match(x,y))
## user  system elapsed 
## 0.049   0.000   0.049

Upvotes: 0

chinsoon12
chinsoon12

Reputation: 25225

Using a for loop to match, store and overwrite a match with NA.

idx <- c()
for (i in x) {
    k <- match(i, y)
    idx <- c(idx, k)
    y[k] <- NA
}
idx

#[1]  3  4  8  7  2 NA

Upvotes: 1

Related Questions