Reputation: 480
I got two vectors:
a<-c(268, 1295, 1788, 2019, 2422)
b<-c(266, 952, 1295, 1791, 2018)
I want to match the elements of b to the elements of a, based on the smallest difference. So a[1] would be matched to b[1]. However, each element can only be matched to a single other element. It is possible that elements cannot be matched. If two elements of b have the smallest difference to the same element in a, then the element with the smaller difference is matched.
For example 952 and 1295 are closest to element a[2], as 1295 is closer (in this case even equal to) a[2] it would get matched with 1295. The final solution for this particular example should look like this.
268 NA 1295 1788 2019 2422
266 952 1295 1791 2018 NA
Some of the item are not matched and although it would be possible to match 952 and 2422 the code I need would not considere them a match because matches were found inbetween them. The vectors are also strictly increasing.
With my coding capabilities I would use tons of if statements to solve that issue. But I was wondering whether this is a know problem, and I am aware of the terminology of such or if someone would have an idea for an elegant solution
Upvotes: 1
Views: 126
Reputation: 48211
A base R approach, although probably not the most elegant one:
aux1 <- apply(abs(outer(a, b, `-`)), 2, function(r) c(min(r), which.min(r)))
colnames(aux1) <- 1:length(b)
aux2 <- tapply(aux1[1, ], factor(aux1[2, ], levels = 1:length(a)),
function(x) as.numeric(names(which.min(x))))
rbind(cbind(a, b = b[aux2]), cbind(a = NA, b = b[-aux2[!is.na(aux2)]]))
# a b
# [1,] 268 266
# [2,] 1295 1295
# [3,] 1788 1791
# [4,] 2019 2018
# [5,] 2422 NA
# [6,] NA 952
Here aux1
contains closest a
elements to b
(2nd row) and the corresponding distances (1st row).
tmp
# [,1] [,2] [,3] [,4] [,5]
# [1,] 2 343 0 3 1
# [2,] 1 2 2 3 4
Then aux2
may already be enough for your purposes.
out
# 1 2 3 4 5
# 1 3 4 5 NA
aux1
showed some ties but aux2
now gives which element of a
(2nd row) should be assigned to which element of b
(names). Then in the last line we bind the rest of the elements.
In a more complex case we have
a <- c(932, 1196, 1503, 2819, 3317, 3845, 4118, 4544)
b <- c(1190, 1498, 2037, 2826, 3323, 4128, 4618, 1190, 1498, 2037, 2826, 3323, 4128, 4618)
# ....
rbind(cbind(a, b = b[aux2]), cbind(a = NA, b = b[-aux2[!is.na(aux2)]]))
# a b
# [1,] 932 NA
# [2,] 1196 1190
# [3,] 1503 1498
# [4,] 2819 2826
# [5,] 3317 3323
# [6,] 3845 NA
# [7,] 4118 4128
# [8,] 4544 4618
# [9,] NA 2037
# [10,] NA 1190
# [11,] NA 1498
# [12,] NA 2037
# [13,] NA 2826
# [14,] NA 3323
# [15,] NA 4128
# [16,] NA 4618
Upvotes: 2