Mark Miller
Mark Miller

Reputation: 13103

Position of elements from one vector in another vector with R

I wish to create a vector holding the position of elements from one vector in another vector. This is similar to the following questions:

Get the index of the values of one vector in another?

Is there an R function for finding the index of an element in a vector?

The match function in base R works in the simplest case, shown here:

a <- c(1,1,2,2,3,3,4,4,5,5)
b <- c(1,2,3,4,5)
desired.output <- c(1,3,5,7,9)
match(b,a)
#[1] 1 3 5 7 9

However, match does not appear to work in more complex cases shown below. I might need a combination of which and match. In every case I am considering so far a value in b does not appear more often in b than in a. I need a base R solution.

a <- c(1,1,2,2,3,3,4,4,5,5)
b <- c(1,2,2,3,4,5)
desired.output <- c(1,3,4,5,7,9)

a <- c(1,1,2,2,3,3,4,4,5,5)
b <- c(1,2,2,3,4,4,5)
desired.output <- c(1,3,4,5,7,8,9)

a <- c(1,1,2,2,3,3,4,4,5,5)
b <- c(1,2,2,3,4,4,5,5)
desired.output <- c(1,3,4,5,7,8,9,10)

a <- c(1,1,2,2,3,3,4,4,5,5)
b <- c(1,1,2,2,3,4,4,5,5)
desired.output <- c(1,2,3,4,5,7,8,9,10)

a <- c(1,1,2,2,3,3,4,4,5,5)
b <- c(1,1,2,2,3,3,4,4,5,5)
desired.output <- c(1,2,3,4,5,6,7,8,9,10)

Upvotes: 6

Views: 1987

Answers (1)

GKi
GKi

Reputation: 39647

For the given cases pmatch will give the desired result. match in combination with make.unique will also work. If speed matters split could be used or a function using Rcpp is possible. Have also a look at Efficiently match all values of a vector in another vector.

pm <- function(x, y) {
    a <- split(seq_along(x), x)
    b <- split(seq_along(y), y)[names(a)]
    b[lengths(b)==0] <- NA
    b <- unlist(Map(`length<-`, b, lengths(a)), FALSE, FALSE)
    `[<-`(b, unlist(a, FALSE, FALSE), b) }

Rcpp::sourceCpp(code=r"(
#include <Rcpp.h>
#include <unordered_map>
#include <queue>

using namespace Rcpp;
// [[Rcpp::export]]
IntegerVector pmC(NumericVector a, NumericVector b) {
  IntegerVector idx(no_init(a.size()));
  std::unordered_map<float, std::queue<int> > lut;
  for(int i = 0; i < b.size(); ++i) lut[b[i]].push(i);
  for(int i = 0; i < idx.size(); ++i) {
    auto search = lut.find(a[i]);
    if(search != lut.end() && search->second.size() > 0) {
      idx[i] = search->second.front() + 1;
      search->second.pop();
    } else {idx[i] = NA_INTEGER;}
  }
  return idx;
}
)")
a <- c(1,1,2,2,3,3,4,4,5,5)
b <- c(1,2,3,4,5)
pmatch(b,a)
match(make.unique(as.character(b)), make.unique(as.character(a)))
pm(b,a)
pmC(b,a)
#[1] 1 3 5 7 9

a <- c(1,1,2,2,3,3,4,4,5,5)
b <- c(1,2,2,3,4,5)
pmatch(b,a)
match(make.unique(as.character(b)), make.unique(as.character(a)))
pm(b,a)
pmC(b,a)
#[1] 1 3 4 5 7 9

a <- c(1,1,2,2,3,3,4,4,5,5)
b <- c(1,2,2,3,4,4,5)
pmatch(b,a)
match(make.unique(as.character(b)), make.unique(as.character(a)))
pm(b,a)
pmC(b,a)
#[1] 1 3 4 5 7 8 9

a <- c(1,1,2,2,3,3,4,4,5,5)
b <- c(1,2,2,3,4,4,5,5)
pmatch(b,a)
match(make.unique(as.character(b)), make.unique(as.character(a)))
pm(b,a)
pmC(b,a)
#[1]  1  3  4  5  7  8  9 10

a <- c(1,1,2,2,3,3,4,4,5,5)
b <- c(1,1,2,2,3,4,4,5,5)
pmatch(b,a)
match(make.unique(as.character(b)), make.unique(as.character(a)))
pm(b,a)
pmC(b,a)
#[1]  1  2  3  4  5  7  8  9 10

a <- c(1,1,2,2,3,3,4,4,5,5)
b <- c(1,1,2,2,3,3,4,4,5,5)
pmatch(b,a)
match(make.unique(as.character(b)), make.unique(as.character(a)))
pm(b,a)
pmC(b,a)
# [1]  1  2  3  4  5  6  7  8  9 10

Upvotes: 6

Related Questions