Reputation: 665
I have two integer/posixct vectors:
a <- c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15) #has > 2 mil elements
b <- c(4,6,10,16) # 200000 elements
Now my resulting vector c should contain for each element of vector a the nearest element of b:
c <- c(4,4,4,4,4,6,6,...)
I tried it with apply
and which.min(abs(a - b))
but it's very very slow.
Is there any more clever way to solve this? Is there a data.table
solution?
Upvotes: 33
Views: 50996
Reputation: 1671
# Function
Closest <- function(x, bands) {
sapply(x, function(y) {
bands[which.min(abs(bands - y))]
})
}
# Be aware that when the value is right between to "bands", then the first one is provided
# The lines below don't return the same
Closest(x = c(0, 25000, 25001, 24999, 53000, 159000), bands = c(0, 50000, 100000))
Closest(x = c(0, 25000, 25001, 24999, 53000, 159000), bands = c(100000, 50000, 0))
Upvotes: 0
Reputation: 3829
As it is presented in this link you can do either:
which(abs(x - your.number) == min(abs(x - your.number)))
or
which.min(abs(x - your.number))
where x
is your vector and your.number
is the value. If you have a matrix or data.frame, simply convert them to numeric vector with appropriate ways and then try this on the resulting numeric vector.
For example:
x <- 1:100
your.number <- 21.5
which(abs(x - your.number) == min(abs(x - your.number)))
would output:
[1] 21 22
Update: Based on the very kind comment of hendy I have added the following to make it more clear:
Note that the answer above (i.e 21
and 22
) are the indexes if the items (this is how which()
works in R), so if you want to get the actual values, you have use these indexes to get the value. Let's have another example:
x <- seq(from = 100, to = 10, by = -5)
x
[1] 100 95 90 85 80 75 70 65 60 55 50 45 40 35 30 25 20 15 10
Now let's find the number closest to 42:
your.number <- 42
target.index <- which(abs(x - your.number) == min(abs(x - your.number)))
x[target.index]
which would output the "value" we are looking for from the x
vector:
[1] 40
Upvotes: 53
Reputation: 101189
Here might be a simple base R option, using max.col
+ outer
:
b[max.col(-abs(outer(a,b,"-")))]
which gives
> b[max.col(-abs(outer(a,b,"-")))]
[1] 4 4 4 4 6 6 6 10 10 10 10 10 16 16 16
Upvotes: 3
Reputation: 571
library(data.table)
a=data.table(Value=c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15))
a[,merge:=Value]
b=data.table(Value=c(4,6,10,16))
b[,merge:=Value]
setkeyv(a,c('merge'))
setkeyv(b,c('merge'))
Merge_a_b=a[b,roll='nearest']
In the Data table when we merge two data table, there is an option called nearest which put all the element in data table a
to the nearest element in data table b
. The size of the resultant data table will be equal to the size of b
(whichever is within the bracket). It requires a common key for merging as usual.
Upvotes: 9
Reputation: 69
For those who would be satisfied with the slow solution:
sapply(a, function(a, b) {b[which.min(abs(a-b))]}, b)
Upvotes: 6
Reputation: 2253
Late to the party, but there is now a function from the DescTools
package called Closest
which does almost exactly what you want (it just doesn't do multiple at once)
To get around this we can lapply
over your a
list, and find the closest.
library(DescTools)
lapply(a, function(i) Closest(x = b, a = i))
You might notice that more values are being returned than exist in a
. This is because Closest
will return both values if the value you are testing is exactly between two (e.g. 3 is exactly between 1 and 5, so both 1 and 5 would be returned).
To get around this, put either min
or max
around the result:
lapply(a, function(i) min(Closest(x = b, a = i)))
lapply(a, function(i) max(Closest(x = b, a = i)))
Then unlist
the result to get a plain vector :)
Upvotes: 1
Reputation: 6921
Not quite sure how it will behave with your volume but cut
is quite fast.
The idea is to cut your vector a
at the midpoints between the elements of b
.
Note that I am assuming the elements in b
are strictly increasing!
Something like this:
a <- c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15) #has > 2 mil elements
b <- c(4,6,10,16) # 200000 elements
cuts <- c(-Inf, b[-1]-diff(b)/2, Inf)
# Will yield: c(-Inf, 5, 8, 13, Inf)
cut(a, breaks=cuts, labels=b)
# [1] 4 4 4 4 4 6 6 6 10 10 10 10 10 16 16
# Levels: 4 6 10 16
This is even faster using a lower-level function like findInterval
(which, again, assumes that breakpoints are non-decreasing).
findInterval(a, cuts)
[1] 1 1 1 1 2 2 2 3 3 3 3 3 4 4 4
So of course you can do something like:
index = findInterval(a, cuts)
b[index]
# [1] 4 4 4 4 6 6 6 10 10 10 10 10 16 16 16
Note that you can choose what happens to elements of a
that are equidistant to an element of b
by passing the relevant arguments to cut
(or findInterval
), see their help page.
Upvotes: 13