Elliot Gorokhovsky
Elliot Gorokhovsky

Reputation: 3762

Optimizing python one-liner

I profiled my program, and more than 80% of the time is spent in this one-line function! How can I optimize it? I am running with PyPy, so I'd rather not use NumPy, but since my program is spending almost all of its time there, I think giving up PyPy for NumPy might be worth it. However, I would prefer to use the CFFI, since that's more compatible with PyPy.

#x, y, are lists of 1s and 0s. c_out is a positive int. bit is 1 or 0.
def findCarryIn(x, y, c_out, bit):

    return (2 * c_out +
            bit -
            sum(map(lambda x_bit, y_bit: x_bit & y_bit, x, reversed(y)))) #note this is basically a dot product.

Upvotes: 0

Views: 118

Answers (2)

Anand S Kumar
Anand S Kumar

Reputation: 90989

Without using Numpy, After testing with timeit , The fastest method for the summing (that you are doing) seems to be using simple for loop and summing over the elements, Example -

def findCarryIn(x, y, c_out, bit):
    s = 0
    for i,j in zip(x, reversed(y)):
        s += i & j
    return (2 * c_out + bit - s)

Though this did not increase the performance by a lot (maybe 20% or so).

The results of timing tests (With different methods , func4 containing the method described above) -

def func1(x,y):
    return sum(map(lambda x_bit, y_bit: x_bit & y_bit, x, reversed(y)))

def func2(x,y):
    return sum([i & j for i,j in zip(x,reversed(y))])

def func3(x,y):
    return sum(x[i] & y[-1-i] for i in range(min(len(x),len(y))))

def func4(x,y):
    s = 0
    for i,j in zip(x, reversed(y)):
        s += i & j
    return s

In [125]: %timeit func1(x,y)
100000 loops, best of 3: 3.02 µs per loop

In [126]: %timeit func2(x,y)
The slowest run took 6.42 times longer than the fastest. This could mean that an intermediate result is being cached
100000 loops, best of 3: 2.9 µs per loop

In [127]: %timeit func3(x,y)
100000 loops, best of 3: 4.31 µs per loop

In [128]: %timeit func4(x,y)
100000 loops, best of 3: 2.2 µs per loop

Upvotes: 1

Bas Swinckels
Bas Swinckels

Reputation: 18488

This can for sure be sped up a lot using numpy. You could define your function something like this:

def find_carry_numpy(x, y, c_out, bit):
    return 2 * c_out + bit - np.sum(x & y[::-1])

Create some random data:

In [36]: n = 100; c = 15; bit = 1

In [37]: x_arr = np.random.rand(n) > 0.5

In [38]: y_arr = np.random.rand(n) > 0.5

In [39]: x_list = list(x_arr)

In [40]: y_list = list(y_arr)

Check that results are the same:

In [42]: find_carry_numpy(x_arr, y_arr, c, bit)
Out[42]: 10

In [43]: findCarryIn(x_list, y_list, c, bit)
Out[43]: 10

Quick speed test:

In [44]: timeit find_carry_numpy(x_arr, y_arr, c, bit)
10000 loops, best of 3: 19.6 µs per loop

In [45]: timeit findCarryIn(x_list, y_list, c, bit)
1000 loops, best of 3: 409 µs per loop

So you gain a factor of 20 in speed! That is a pretty typical speedup when converting Python code to Numpy.

Upvotes: 1

Related Questions