La-Li-Lu-Le-Low
La-Li-Lu-Le-Low

Reputation: 211

How to parallelize function on list comprehension (and keep order)

I have a list of 2d arrays (different lengths) to which I need to apply certain functions efficiently via list comprehension.

Since this is still not fast enough, the list comprehension needs to be parallelized.

What is the proper way to do that, with keeping the order of the slices (or "subarrays")?

def get_slice_max(arr): 
     '''
     get the slice, but replace every element with the maximum value that has occoured till(including) the iter so far.
     ''' 
     result = [arr[0]] 
     for i in range(1, len(arr)):  
         result.append(max(result[-1], arr[i])) 
     return result

result  = [get_slice_max(slice_)  for slice_ in a]

reproducable sample:

a = [ np.array(range(1, random.randint(3, 8))) for x in range(10000)]

Edit: I need the parallelization for list comprehensions like those:

temp = np.random.randint(1, high=100, size=10) # determines the sizes of the subarrays
A,B,C =  [ np.randint(0, high=1, size=x) for x in temp],
    [ np.random.uniform(size=x) for x in temp],
    [ np.random.uniform(size=x) for x in temp]
result = [ [y if x==1 else z for x, y, z in zip(a, b, c)] 
              for  a, b, c  in zip(A, B, C,) ]

temp = np.random.randint(1, high=100, size=10) # determines the sizes of the subarrays
D, E = [ np.random.uniform(size=x) for x in temp], [ np.randint(0, high=1, size=x) for x in temp]
[ [ x/y for x,y in zip(d,np.maximum.accumulate(get_slice_max(e))] for d, e in zip(D, E) ] 

Upvotes: 0

Views: 1177

Answers (1)

Chris
Chris

Reputation: 29742

Use numpy.maximum.accumulate:

# Sample
a = [np.random.randint(1, 10, np.random.randint(3, 8)) for _ in range(10000)]
a[:3]
# [array([4, 5, 6]), array([7, 2, 8, 2, 9, 5]), array([5, 1, 7, 5])]

[np.maximum.accumulate(arr) for arr in a]

Output:

[array([4, 5, 6]), array([7, 7, 8, 8, 9, 9]), array([5, 5, 7, 7])]

Validation:

all(np.array_equal(get_slice_max(arr), np.maximum.accumulate(arr)) for arr in a)
# True

Benchmark (about 6x faster):

%timeit [np.maximum.accumulate(arr) for arr in a]
# 6.07 ms ± 498 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit [get_slice_max(arr) for arr in a]
# 32.4 ms ± 11 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Upvotes: 1

Related Questions