Reputation: 13103
I wish to create a vector
holding the position of elements
from one vector
in another vector
. This is similar to the following questions:
Get the index of the values of one vector in another?
Is there an R function for finding the index of an element in a vector?
The match
function in base R
works in the simplest case, shown here:
a <- c(1,1,2,2,3,3,4,4,5,5)
b <- c(1,2,3,4,5)
desired.output <- c(1,3,5,7,9)
match(b,a)
#[1] 1 3 5 7 9
However, match
does not appear to work in more complex cases shown below. I might need a combination of which
and match
. In every case I am considering so far a value in b
does not appear more often in b
than in a
. I need a base R
solution.
a <- c(1,1,2,2,3,3,4,4,5,5)
b <- c(1,2,2,3,4,5)
desired.output <- c(1,3,4,5,7,9)
a <- c(1,1,2,2,3,3,4,4,5,5)
b <- c(1,2,2,3,4,4,5)
desired.output <- c(1,3,4,5,7,8,9)
a <- c(1,1,2,2,3,3,4,4,5,5)
b <- c(1,2,2,3,4,4,5,5)
desired.output <- c(1,3,4,5,7,8,9,10)
a <- c(1,1,2,2,3,3,4,4,5,5)
b <- c(1,1,2,2,3,4,4,5,5)
desired.output <- c(1,2,3,4,5,7,8,9,10)
a <- c(1,1,2,2,3,3,4,4,5,5)
b <- c(1,1,2,2,3,3,4,4,5,5)
desired.output <- c(1,2,3,4,5,6,7,8,9,10)
Upvotes: 6
Views: 1987
Reputation: 39647
For the given cases pmatch
will give the desired result. match
in combination with make.unique
will also work. If speed matters split
could be used or a function using Rcpp
is possible. Have also a look at Efficiently match all values of a vector in another vector.
pm <- function(x, y) {
a <- split(seq_along(x), x)
b <- split(seq_along(y), y)[names(a)]
b[lengths(b)==0] <- NA
b <- unlist(Map(`length<-`, b, lengths(a)), FALSE, FALSE)
`[<-`(b, unlist(a, FALSE, FALSE), b) }
Rcpp::sourceCpp(code=r"(
#include <Rcpp.h>
#include <unordered_map>
#include <queue>
using namespace Rcpp;
// [[Rcpp::export]]
IntegerVector pmC(NumericVector a, NumericVector b) {
IntegerVector idx(no_init(a.size()));
std::unordered_map<float, std::queue<int> > lut;
for(int i = 0; i < b.size(); ++i) lut[b[i]].push(i);
for(int i = 0; i < idx.size(); ++i) {
auto search = lut.find(a[i]);
if(search != lut.end() && search->second.size() > 0) {
idx[i] = search->second.front() + 1;
search->second.pop();
} else {idx[i] = NA_INTEGER;}
}
return idx;
}
)")
a <- c(1,1,2,2,3,3,4,4,5,5)
b <- c(1,2,3,4,5)
pmatch(b,a)
match(make.unique(as.character(b)), make.unique(as.character(a)))
pm(b,a)
pmC(b,a)
#[1] 1 3 5 7 9
a <- c(1,1,2,2,3,3,4,4,5,5)
b <- c(1,2,2,3,4,5)
pmatch(b,a)
match(make.unique(as.character(b)), make.unique(as.character(a)))
pm(b,a)
pmC(b,a)
#[1] 1 3 4 5 7 9
a <- c(1,1,2,2,3,3,4,4,5,5)
b <- c(1,2,2,3,4,4,5)
pmatch(b,a)
match(make.unique(as.character(b)), make.unique(as.character(a)))
pm(b,a)
pmC(b,a)
#[1] 1 3 4 5 7 8 9
a <- c(1,1,2,2,3,3,4,4,5,5)
b <- c(1,2,2,3,4,4,5,5)
pmatch(b,a)
match(make.unique(as.character(b)), make.unique(as.character(a)))
pm(b,a)
pmC(b,a)
#[1] 1 3 4 5 7 8 9 10
a <- c(1,1,2,2,3,3,4,4,5,5)
b <- c(1,1,2,2,3,4,4,5,5)
pmatch(b,a)
match(make.unique(as.character(b)), make.unique(as.character(a)))
pm(b,a)
pmC(b,a)
#[1] 1 2 3 4 5 7 8 9 10
a <- c(1,1,2,2,3,3,4,4,5,5)
b <- c(1,1,2,2,3,3,4,4,5,5)
pmatch(b,a)
match(make.unique(as.character(b)), make.unique(as.character(a)))
pm(b,a)
pmC(b,a)
# [1] 1 2 3 4 5 6 7 8 9 10
Upvotes: 6