Mau Yin
Mau Yin

Reputation: 13

Is there any way to speed up the computation time of calling a function with multiple time in python?

    import numpy as np
    import matplotlib.pyplot as plt
    from numpy import random
    import time
    from collections import Counter
   
    def simulation(N):
        I = 10
        success = 0
        M = 100
        for i in range(I):
            s = allocate(N,M)
            M -= s
            success += s
        return success

    def allocate(N,M):
        count = Counter(random.randint(N,size = M))
        success = sum(j for v,j in count.items() if j == 1)
        return success

    if __name__ == "__main__":
        start = time.perf_counter() 
        SAMPLE_SIZE = 100000
        N = np.linspace(5,45,41).astype(int)
        Ps = []
        for n in N:
            ps = []
            for _ in range(SAMPLE_SIZE):
                ps.append(simulation(n)/100)
            result = np.average(np.array(ps))
            Ps.append(result)

        elapsed = (time.perf_counter() - start)
        print("Time used:",elapsed)
        plt.scatter(N,Ps)
        plt.show()

Here is my situation. The ultimate goal is to set SAMPLE_SIZE to 10^7. However, when I set it to 10^5, it already requires about 1000sec to run it. Is there any way to make it more efficient and faster? Thanks for giving me suggestions.

Upvotes: 1

Views: 1304

Answers (2)

Simon Tartakovksy
Simon Tartakovksy

Reputation: 309

I might be missing the point but it seems like you can replace your simulation with a closed form calculation.

I believe the problem you are solving is find the expected number of boxes with exactly 1 ball in them given a random distribution of M balls in N boxes.

Follow the answer here https://math.stackexchange.com/a/66094 to get the closed form expression (André Nicolas solved it for 10 balls and 5 boxes but you should be able to extrapolate)

(As a side note I will often also write code to confirm that my probability calculations are correct, if this is what you are doing sorry about stating the obvious :P )

Upvotes: 0

Jérôme Richard
Jérôme Richard

Reputation: 50308

First of all, the implementation of allocate is not very efficient: you can use vectorized Numpy function to do that:

def allocate(N, M):
    success = np.count_nonzero(np.bincount(random.randint(N, size=M)) == 1)
    return success

The thing is most of the time comes from the overhead of Numpy functions performing some checks and create some temporary arrays. You can use Numba to fix this problem:

import numba as nb

@nb.njit('int_(int_, int_)')
def allocate(N,M):
    tmp = np.zeros(N, np.int_)
    for i in range(M):
        rnd = np.random.randint(0, N)
        tmp[rnd] += 1
    count = 0
    for i in range(N):
        count += tmp[i] == 1
    return count

Then, you can speed up the code a bit further by using the Numba decorator @nb.njit('int_(int_)') to the simulation function so to avoid the overhead of calling Numba functions from the CPython interpreter.

Finally, you can speed up the main loop by running it in parallel with Numba (and also avoid the use of slow lists). You can also recycle the tmp array so not to cause too many allocations (that are expensive and do not scale with the number of cores). Here is the resulting final code:

import numpy as np
import matplotlib.pyplot as plt
import time
import numba as nb

# Recycle the `tmp` buffer not to do many allocations
@nb.njit('int_(int_, int_, int_[::1])')
def allocate(N, M, tmp):
    tmp.fill(0)
    for i in range(M):
        rnd = np.random.randint(0, N)
        tmp[rnd] += 1
    count = 0
    for i in range(N):
        count += tmp[i] == 1
    return count

@nb.njit('int_(int_)')
def simulation(N):
    I = 10
    success = 0
    M = 100
    tmp = np.zeros(N, np.int_) # Preallocated buffer
    for i in range(I):
        s = allocate(N, M, tmp)
        M -= s
        success += s
    return success

@nb.njit('float64(int_, int_)', parallel=True)
def compute_ps_avg(n, sample_size):
    ps = np.zeros(sample_size, dtype=np.float64)
    for i in nb.prange(sample_size):
        ps[i] = simulation(n) / 100.0
    # Note that np.average is not yet supported by Numba
    return np.mean(ps)

if __name__ == "__main__":
    start = time.perf_counter() 
    SAMPLE_SIZE = 100_000
    N = np.linspace(5,45,41).astype(int)
    Ps = [compute_ps_avg(n, SAMPLE_SIZE) for n in N]
    elapsed = (time.perf_counter() - start)
    print("Time used:",elapsed)
    plt.scatter(N,Ps)
    plt.show()

Here are performance results on my 10-core machine:

Initial code:          670.6 s
Optimized Numba code:    3.9 s

The resulting code is 172 times faster.

More than 80% of the time is spent in the generation of random numbers. Thus, if you want the code to be faster, one solution is to speed up the generation of random number using a SIMD-optimized random number generator. Unfortunately, AFAIK, this is not possible to (efficiently) achieve this in Python. You certainly need to use a native language like C or C++ to do that.

Upvotes: 2

Related Questions