mindlessgreen
mindlessgreen

Reputation: 12112

R vector-vector matching with ordered indices

Here I have two string vectors whose order is important and cannot be changed.

vec1 <- c("carrot","carrot","carrot","apple","apple","mango","mango","cherry","cherry")
vec2 <- c("cherry","apple")

I wish to find out if elements in vec2 appears in vec1 and if so, where (index/position) and in what order.

I tried which(vec1 %in% vec2) which gives 4 5 8 9. These are correct indices, but in the wrong order. I tried match(vec2,vec1) which gives 8 4. Only the first match is returned. This would work if vec1 was unique.

Ideally, I am looking for this result: 8 9 4 5. cherry is first matched at pos 8 and 9 and then apple is matched at 4 and 5.

Is there a smart way to do this without resorting to loops?

Upvotes: 6

Views: 1081

Answers (2)

Mark
Mark

Reputation: 4537

which(!is.na(match(vec1,vec2)))[order(match(vec1,vec2)[!is.na(match(vec1,vec2))])]

Wow...there's probably an easier way to do this but...

> match(vec1,vec2)
[1] NA NA NA  2  2 NA NA  1  1

OK, so by reversing the match, I can use which() to get the index where it's not NA

> which(!is.na(match(vec1,vec2)))
[1] 4 5 8 9

This gets the indices you want, but not in the order you want. So if we use order on the match() vector it will let me re-sort to the desired value. Here, I match again, and keep only the non-NA values.

> order(match(vec1,vec2)[!is.na(match(vec1,vec2))])
[1] 3 4 1 2

Subsort by this and you get:

> which(!is.na(match(vec1,vec2)))[order(match(vec1,vec2)[!is.na(match(vec1,vec2))])]
[1] 8 9 4 5

If this is slow, save the match statement first to not do it over and over again.

Upvotes: 1

Mamoun Benghezal
Mamoun Benghezal

Reputation: 5314

you can try this

unlist(lapply(vec2, function(x) which(vec1 %in% x)))
[1] 8 9 4 5

which will return successively the elements in vec1 present in vec2 one by one.

Upvotes: 11

Related Questions