Reputation: 7839
Suppose I have n
intervals of length len
on an axis. Each of them is assigned a value in an ndarray
. Now I'd like to look-up some values at positions query_pos
given as floating point numbers. Currently, I plan to do it the following way:
n = 100
len = 0.39483
data = ... # creates ndarray with data of length n = 100
query_pos = ... # creates an ndarray with positions to query
values_at_query_pos = data[(query_pos / len).astype(int)] # questionable line
Is that a good way to do it or are there more efficient ways to convert a floating point query position into an integer index and then read from an array? I particularly wonder, wether astype(int)
is a cheap or an expensive operation, compared for example to the division or the memory read.
Some more remarks:
Finally, it will be used in 2 and 3 dimensions. Currently, I plan to catch positions that would lead to illegal indices before they go into the lookup stage.
The data
array will have a high-enough resolution so that I don't
need any filtering or interpolation. That will be done in previous
stages.
Upvotes: 0
Views: 366
Reputation: 589
You can output the result of the division straight to an int
array:
idx = np.empty_like(query_pos, int)
np.divide(query_pos, len, out=idx, casting='unsafe')
This will be noticeably faster only for large arrays. But this code is harder to read, so only optimize if it's a bottleneck!
Upvotes: 1
Reputation: 221674
Instead of dividing each element of query_pos
by that scalar, we can pre-calculate the reciprocal of the scalar and use multiplication instead for some speedup there. The intuition is that division is a more costly affair than multiplication.
Here's a quick runtime test on it -
In [177]: # Setup inputs
...: n = 100
...: len1 = 0.39483
...: data = np.random.rand(100)
...: query_pos = np.random.randint(0,25,(100000))
...:
In [178]: %timeit query_pos / len1
1000 loops, best of 3: 612 µs per loop
In [179]: %timeit query_pos * (1/len1)
10000 loops, best of 3: 173 µs per loop
Secondly, if there are many repeated indices, just like in the setup used for the runtime test shown earlier, we can use np.take
for some further marginal improvement, as shown below -
In [196]: %timeit data[(query_pos *(1/len1)).astype(int)]
1000 loops, best of 3: 538 µs per loop
In [197]: %timeit np.take(data,(query_pos * (1/len1)).astype(int))
1000 loops, best of 3: 526 µs per loop
If you are planning to use it on generic ndarrays, we would need to use axis
param with np.take
.
Comparing it with the original approach -
In [202]: %timeit data[(query_pos / len1).astype(int)]
1000 loops, best of 3: 967 µs per loop
Finally, on the question of how the division operation stacks up against converting to int
, they seem comparable on the big dataset. But, it seems you can't avoid the conversion as needed for indexing. Here's a timing test on it -
In [210]: idx = query_pos * (1/len1)
In [211]: %timeit query_pos * (1/len1)
10000 loops, best of 3: 165 µs per loop
In [212]: %timeit idx.astype(int)
10000 loops, best of 3: 110 µs per loop
Upvotes: 4