Xenon
Xenon

Reputation: 123

Is there a way to make list processing as fast as np.array?

I am currently replacing some code which I wrote with the assumption that the inputs are numpy arrays such that it takes arbitrary lists as input. Unfortunately the solutions I produced so far are substantially slower than the original code. Can someone give advise how I might reach back to the speed of the original solution?

The code is supposed to produce a boolean index for the upper triangular matrix representation. Without input checks and stuff like this this is the meat of the code:

some import and example input:

import numpy as np
descriptor = list(range(100))
descriptor_arr = np.array(descriptor)
value = [0, 2, 13, 14, 11, 23, 45, 16]

This is my current list based version:

def get_idx_slow(descriptor, value):
    ix, iy = np.triu_indices(len(descriptor), 1)
    pattern_in_value = [p in value for p in descriptor]
    return [(pattern_in_value[idx_x] & pattern_in_value[idx_y]) for idx_x, idx_y in zip(ix, iy)]

This is the previous array based version:

def get_idx_fast(descriptor, value):
    ix, iy = np.triu_indices(len(descriptor), 1)
    selection_x = np.any(np.array([descriptor[ix] == v for v in value]), axis=0)
    selection_y = np.any(np.array([descriptor[iy] == v for v in value]), axis=0)
    return selection_x & selection_y

My timing results:

%timeit get_idx_slow(descriptor, value)
1.2 ms ± 33.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit get_idx_fast(descriptor_arr, value)
217 µs ± 1.88 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Upvotes: 0

Views: 42

Answers (1)

alexander-jh
alexander-jh

Reputation: 58

It's definitely the lazy solution, but just converting the list in the slow function to an array, calling the other function, and converting back to a list. It seemed to be reasonably effective.

Update:

def get_idx_slow(descriptor, value):
    return get_idx_fast(np.asarray(descriptor), value).tolist()

Results:

%timeit get_idx_slow_orig(descriptor, value)
892 µs ± 15.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit get_idx_slow(descriptor, value)
182 µs ± 1.11 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

%timeit get_idx_fast(descriptor_arr, value)
150 µs ± 1.21 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Upvotes: 1

Related Questions