Reputation: 409
I know that match(x,y)
returns the first match of all elements of x in y.
Assuming that x may contain the same value multiple time, I am looking for a concise way to match the nth occurrence in x with the nth occurrence in y.
For example: `
x <- c(3,4,4,3,2,4)
y <- c(1,2,3,4,1,2,3,4)
my.match(x, y)
## 3,4,8,7,2,NA
Upvotes: 3
Views: 70
Reputation: 409
The following function is much faster when vectors are large because it does not iterate over the whole vector
my.match <- function(x,y){
fidx <- rep(FALSE,length(x))
fidy <- rep(FALSE,length(y))
ret <- rep(NA,length(x))
repeat{
nidx <- which(!fidx)
nidy <- which(!fidy)
idx <- match(x[nidx],y[nidy])
idy <- match(y[nidy],x[nidx])
ret[nidx] <- nidy[idx]
fidx[nidx[unique(idy)]] <- TRUE
fidy[nidy[unique(idx)]] <- TRUE
if(sum(!is.na(idx))==0 | sum(!is.na(idy))==0){
break
}
}
return(ret)
}
Benchmarking with the other proposed method yields:
my.match1 <- function(x,y){
idx <- c()
for (i in x) {
k <- match(i, y)
idx <- c(idx, k)
y[k] <- NA
}
return(idx)
}
x <- sample.int(100,10000,replace=T)
y <- sample.int(100,10000,replace=T)
system.time(my.match1(x,y))
## user system elapsed
## 1.016 0.003 1.020
system.time(my.match(x,y))
## user system elapsed
## 0.049 0.000 0.049
Upvotes: 0
Reputation: 25225
Using a for
loop to match, store and overwrite a match with NA.
idx <- c()
for (i in x) {
k <- match(i, y)
idx <- c(idx, k)
y[k] <- NA
}
idx
#[1] 3 4 8 7 2 NA
Upvotes: 1