Mr. T
Mr. T

Reputation: 12410

Speeding up summation in Python

I have a large list with numbers n1 in listn1. I want to add all multiples of each number, if the multiplicand is not in another list listn2 (prime numbers with specific characteristics) and the product is below a maximum max_n. The multiplicand is in most cases below 10, but can go up to 100,000. My code so far looks like:

s = 0
for n1 in listn1:
    s += sum(n1 * i for i in range(1, 1 + max_n // n1) if i not in listn2)

The problem: this approach is sloooow. It takes seconds to calculate listn1 and listn2, so I know that there are more than a million numbers to add. But I started the summation yesterday evening and it is still running this morning.

Is there a Pythonic way to speed up this summation?

Upvotes: 1

Views: 570

Answers (2)

Mr. T
Mr. T

Reputation: 12410

Taking the suggestions on board, I re-wrote the code to

setn2 = set(listn2)
s = 0
for n1 in listn1:
    s += n1 * (max_n * (max_n + 1) // 2 - sum(i for i in range(1, 1 + max_n // n1) if i in setn2))

Instead of hours, the summation now takes seconds.

Several hours later

Coming across this old question of mine, it turned out that retrospectively, I did the right thing. The idea mentioned here in this thread to use numpy.sum() instead was well intended but wrong, as shown here. If you work in numpy, fine, but if you have a Python list, use comprehensions.

Upvotes: 0

Alperen
Alperen

Reputation: 4592

I have 2 suggestions for you.

First of all, you don't have multiply i with n1 at each iteration. You can replace

s += sum(n1 * i for i in range(1, 1 + max_n // n1) if i not in listn2)

with

s += n1 * sum(i for i in range(1, 1 + max_n // n1) if i not in listn2)

They are totally same.

Secondly, without if i not in listn2 condition, you have a simple summation:

sum(i for i in range(1, 1 + max_n // n1)

This is same with sum([1, 2, 3, 4, 5, 6, 7, 8, ..., (max_n // n1)]), and equals (max_n // n1) * (1 + max_n // n1) / 2. For a simply example, take a look at this.

To handle if i not in listn2 condition, if your listn2 is smaller, you can sum listn2 instead of listn1.

So, find the sum of listn1 and subtract the items in listn2:

def sum_until(l, max):
    return sum([x for x in l if x < max])

listn2 = list(set(listn2))

for n1 in listn1:
    finish = max_n // n1
    s += n1 * (finish * (finish + 1) / 2 - sum_until(listn2, finish)) 

EDIT:

I guess NumPy would be faster for sum. Make listn2 a numpy array:

import numpy as np

listn2 = np.array(list(set(listn2))) 

And use this sum_until function:

def sum_until(listn2, max):
    l = listn2[np.where(listn2 <= max)]
    return int(np.sum(l))

Upvotes: 1

Related Questions