Reputation: 42409
I have a block of code which does the following:
b_lst
below, of index indx
i
and the next one (of index i+1
) in list a_lst
indx
in a sub-list of a third list (c_lst
) where the index of that sub-list is the index of the left float in a_lst
(ie: i
)b_lst
Here's a MWE
which shows what the code does:
import numpy as np
import timeit
def random_data(N):
# Generate some random data.
return np.random.uniform(0., 10., N).tolist()
# Data lists.
# Note that a_lst is sorted.
a_lst = np.sort(random_data(1000))
b_lst = random_data(5000)
# Fixed index value (int)
c = 25
def func():
# Create empty list with as many sub-lists as elements present
# in a_lst beyond the 'c' index.
c_lst = [[] for _ in range(len(a_lst[c:])-1)]
# For each element in b_lst.
for indx,elem in enumerate(b_lst):
# For elements in a_lst beyond the 'c' index.
for i in range(len(a_lst[c:])-1):
# Check if 'elem' is between this a_lst element
# and the next.
if a_lst[c+i] < elem <= a_lst[c+(i+1)]:
# If it is then store the index of 'elem' ('indx')
# in the 'i' sub-list of c_lst.
c_lst[i].append(indx)
return c_lst
print func()
# time function.
func_time = timeit.timeit(func, number=10)
print func_time
This code works as it should but I really need to improve its performance since it's slowing down the rest of my code.
Add
This is the optimized function based on the accepted answer. It's quite ugly but it gets the job done.
def func_opt():
c_lst = [[] for _ in range(len(a_lst[c:])-1)]
c_opt = np.searchsorted(a_lst[c:], b_lst, side='left')
for elem in c_opt:
if 0<elem<len(a_lst[c:]):
c_lst[elem-1] = np.where(c_opt==elem)[0].tolist()
return c_lst
In my tests this is ~7x faster than the original function.
Add 2
Much faster not using np.where
:
def func_opt2():
c_lst = [[] for _ in range(len(a_lst[c:])-1)]
c_opt = np.searchsorted(a_lst[c:], b_lst, side='left')
for indx,elem in enumerate(c_opt):
if 0<elem<len(a_lst[c:]):
c_lst[elem-1].append(indx)
return c_lst
This is ~130x faster than the original function.
Add 3
Following jtaylor's advice I converted the result of np.searchsorted
to a list with .tolist()
:
def func_opt3():
c_lst = [[] for _ in range(len(a_lst[c:])-1)]
c_opt = np.searchsorted(a_lst[c:], b_lst, side='left').tolist()
for indx,elem in enumerate(c_opt):
if 0<elem<len(a_lst[c:]):
c_lst[elem-1].append(indx)
return c_lst
This is ~470x faster than the original function.
Upvotes: 2
Views: 96
Reputation: 67457
You want to take a look at numpy's searchsorted
. Calling
np.searchsorted(a_lst, b_lst, side='right')
will return an array of indices, the same length as b_lst
, holding before which item in a_lst
they should be inserted to preserve order. It will be very fast, as it uses binary search and the looping happens in C. You could then create your subarrays with fancy indexing, e.g.:
>>> a = np.arange(1, 10)
>>> b = np.random.rand(100) * 10
>>> c = np.searchsorted(a, b, side='right')
>>> b[c == 0]
array([ 0.54620226, 0.40043875, 0.62398925, 0.40097674, 0.58765603,
0.14045264, 0.16990249, 0.78264088, 0.51507254, 0.31808327,
0.03895417, 0.92130027])
>>> b[c == 1]
array([ 1.34599709, 1.42645778, 1.13025996, 1.20096723, 1.75724448,
1.87447058, 1.23422399, 1.37807553, 1.64118058, 1.53740299])
Upvotes: 3