Janosch
Janosch

Reputation: 480

Matching elements of two vectors based on proximity

I got two vectors:

a<-c(268, 1295, 1788, 2019, 2422)
b<-c(266,  952, 1295, 1791, 2018)

I want to match the elements of b to the elements of a, based on the smallest difference. So a[1] would be matched to b[1]. However, each element can only be matched to a single other element. It is possible that elements cannot be matched. If two elements of b have the smallest difference to the same element in a, then the element with the smaller difference is matched.

For example 952 and 1295 are closest to element a[2], as 1295 is closer (in this case even equal to) a[2] it would get matched with 1295. The final solution for this particular example should look like this.

268  NA  1295 1788 2019 2422
266 952  1295 1791 2018 NA

Some of the item are not matched and although it would be possible to match 952 and 2422 the code I need would not considere them a match because matches were found inbetween them. The vectors are also strictly increasing.

With my coding capabilities I would use tons of if statements to solve that issue. But I was wondering whether this is a know problem, and I am aware of the terminology of such or if someone would have an idea for an elegant solution

Upvotes: 1

Views: 126

Answers (1)

Julius Vainora
Julius Vainora

Reputation: 48211

A base R approach, although probably not the most elegant one:

aux1 <- apply(abs(outer(a, b, `-`)), 2, function(r) c(min(r), which.min(r)))
colnames(aux1) <- 1:length(b)
aux2 <- tapply(aux1[1, ], factor(aux1[2, ], levels = 1:length(a)),
               function(x) as.numeric(names(which.min(x))))
rbind(cbind(a, b = b[aux2]), cbind(a = NA, b = b[-aux2[!is.na(aux2)]]))
#         a    b
# [1,]  268  266
# [2,] 1295 1295
# [3,] 1788 1791
# [4,] 2019 2018
# [5,] 2422   NA
# [6,]   NA  952

Here aux1 contains closest a elements to b (2nd row) and the corresponding distances (1st row).

tmp
#      [,1] [,2] [,3] [,4] [,5]
# [1,]    2  343    0    3    1
# [2,]    1    2    2    3    4

Then aux2 may already be enough for your purposes.

out
#  1  2  3  4  5 
#  1  3  4  5 NA 

aux1 showed some ties but aux2 now gives which element of a (2nd row) should be assigned to which element of b (names). Then in the last line we bind the rest of the elements.


In a more complex case we have

a <- c(932, 1196, 1503, 2819, 3317, 3845, 4118, 4544)
b <- c(1190, 1498, 2037, 2826, 3323, 4128, 4618, 1190, 1498, 2037, 2826, 3323, 4128, 4618)

# ....

rbind(cbind(a, b = b[aux2]), cbind(a = NA, b = b[-aux2[!is.na(aux2)]]))    
#          a    b
#  [1,]  932   NA
#  [2,] 1196 1190
#  [3,] 1503 1498
#  [4,] 2819 2826
#  [5,] 3317 3323
#  [6,] 3845   NA
#  [7,] 4118 4128
#  [8,] 4544 4618
#  [9,]   NA 2037
# [10,]   NA 1190
# [11,]   NA 1498
# [12,]   NA 2037
# [13,]   NA 2826
# [14,]   NA 3323
# [15,]   NA 4128
# [16,]   NA 4618

Upvotes: 2

Related Questions