Reputation: 364
I have two 1D Numpy arrays start
and stop
, both of which contain integers (which are used to index some other array). I have the following piece of code.
index_list = []
for i in range(len(start)):
temp = range(start[i], stop[i])
index_list.extend(temp)
index_list = np.array(index_list)
Is there a simple way to vectorize this?
Upvotes: 2
Views: 106
Reputation: 67427
You can vectorize this as follows:
def make_index_list(start, stop):
lens = stop - start
cum_lens = np.cumsum(lens)
# Sequential indices the same length as the expected output
out = np.arange(cum_lens[-1])
# Starting index for each section of `out`
cum_lens = np.concatenate(([0], cum_lens[:-1]))
# How much each section of `out` is off from the correct value
deltas = start - out[cum_lens]
# Apply the correction
out += np.repeat(deltas, lens)
return out
With some made up data:
start = np.random.randint(100, size=(100000,))
stop = start + np.random.randint(1, 10 ,size=start.shape)
We can take the code for a test ride:
In [39]: %%timeit
....: index_list = []
....: for i in range(len(start)):
....: temp = range(start[i], stop[i])
....: index_list.extend(temp)
....: index_list = np.array(index_list)
....:
10 loops, best of 3: 137 ms per loop
In [40]: %timeit make_index_list(start, stop)
100 loops, best of 3: 9.27 ms per loop
In [41]: np.array_equal(make_index_list(start, stop), index_list)
Out[41]: True
So it is correct and about 15x faster, not bad at all...
Upvotes: 5
Reputation: 74172
If you're prepared to get your hands dirty, you can speed it up quite significantly using Cython
Original function, for reference:
import numpy as np
def original_indices(start, stop):
index_list = []
for i in range(len(start)):
temp = range(start[i], stop[i])
index_list.extend(temp)
return np.array(index_list)
Cythonized version:
#!python
# cython: boundscheck=False
# cython: wraparound=False
import numpy as np
cimport numpy as np
def cython_indices(Py_ssize_t[:] start, Py_ssize_t[:] stop):
cdef:
Py_ssize_t final_size, count, ii
Py_ssize_t[:] index_array
final_size = 0
for ii in range(start.shape[0]):
final_size += stop[ii] - start[ii]
index_array = np.empty(final_size, dtype=np.int64)
count = 0
for ii in range(start.shape[0]):
idx = start[ii]
while idx < stop[ii]:
index_array[count] = idx
idx += 1
count += 1
return index_array
Some fake data:
start = np.random.random_integers(0, 1000, size=100000)
stop = start + np.random.random_integers(0, 10, size=100000)
Some timings:
%timeit original_indices(start, stop)
# 10 loops, best of 3: 79.4 ms per loop
%timeit cython_indices(start, stop)
# 1000 loops, best of 3: 1.35 ms per loop
Cython speeds things up by an order of magnitude compared with the original version.
Upvotes: 2