mrCarnivore
mrCarnivore

Reputation: 5068

How to efficiently fill a np.array?

I try to fill in data into a numpy array. However, for higher indexes it takes more and more time. Why?

And how can I prevent that? I already have created the arrays in the final dimensions...

import random
import numpy as np

# p = [ ... 2200 values in a python list ... ]

iterations = 1000
max_draws = len(p)-1

percentiles = np.zeros(max_draws)
money_list = np.zeros(iterations)

invest = 100
for k in range(1,max_draws):
    print(k)
    for j in range(0,iterations):
        money_list[j] = (invest * np.random.choice(p, k)).sum()

    percentiles[k] = np.percentile(money_list, 5)

I have a list of factors p that represent gains from a trade at the stock market. Now I want to find out how many of that trades I must do (taken from the list of possible trades) so that with 95 % propability I make money and not lose money (given that if I make all the trades I make money and not lose it).

Upvotes: 1

Views: 119

Answers (1)

Arty
Arty

Reputation: 16737

After all suggested improvements one more very efficient improvement can be done.

If you don't mind installing and using quite heavy extra python pip package numba (by python -m pip install numba) then you can improve speed considerably, like in next code.

Numba is designed to precompile Python's functions to efficient machine code, also it is designed to be used with NumPy. It converts python loops to fast C code and compiles it using LLVM.

Next code achieves speedups of 4.18x times for 2199 iterations of outer loop like in your code, and up to 100x times speedup for few 5-20 iterations. All 2199 iterations for your case using Numba where done in 90 second on my slow PC.

Try next code here online too!

# Needs: python -m pip install numpy numba
import random, numpy as np, numba, timeit

p = np.random.random((2200,)) # or do p = np.array(p) if p is a list

iterations = 1000
max_draws = len(p) - 1

invest = 100

def do_regular(hi):
    percentiles = np.zeros(max_draws)
    money_list = np.zeros(iterations)

    for k in range(1, hi):
        for j in range(0,iterations):
            money_list[j] = (invest * np.random.choice(p, k)).sum()

        percentiles[k] = np.percentile(money_list, 5)
        
    return percentiles, money_list

do_numba  = numba.jit(nopython = True)(do_regular)
            
do_numba(2) # Pre-compile, heat up
for hi in [8, 16, 32, 64, 128, 256, 512, max_draws]: #max_draws
    tr = timeit.timeit(lambda: do_regular(hi), number = 1)
    tn = timeit.timeit(lambda: do_numba(hi), number = 1)
    print(str(hi).rjust(4), 'regular', round(tr, 3), 'sec')
    print(str(hi).rjust(4), 'numba', round(tn, 3), 'sec, speedup', round(tr / tn, 2), flush = True)

outputs:

   8 regular 0.604 sec
   8 numba 0.005 sec, speedup 131.2
  16 regular 1.296 sec
  16 numba 0.013 sec, speedup 101.36
  32 regular 2.672 sec
  32 numba 0.034 sec, speedup 78.18
  64 regular 5.515 sec
  64 numba 0.113 sec, speedup 48.87
 128 regular 11.3 sec
 128 numba 0.374 sec, speedup 30.19
 256 regular 23.758 sec
 256 numba 1.35 sec, speedup 17.59
 512 regular 51.767 sec
 512 numba 5.086 sec, speedup 10.18
2199 regular 376.327 sec
2199 numba 90.104 sec, speedup 4.18

Upvotes: 1

Related Questions