Angelica White
Angelica White

Reputation: 19

Appending to a numpy array in for loop

I'm trying to create a Monte Carlo simulation to simulate future stock prices using Numpy arrays.

My current approach is: create a For Loop which fills an array, stock_price_array, with simulated stock prices. These stock prices are generated by taking the last stock price, then multiplying it by 1 + an annual return. The annual returns are drawn randomly from a normal distribution and stored in the array annual_ret.

My problem is that although the "stock price" variables I print from my For Loop appear to be correct, I simply cannot figure out how to Append these stock price variables to stock_price_array.

I've tried various methods, including initializing the stock_price_array using .full instead of .empty, changing the order of where the array appears in the For Loop, and checking the size of the array.

I've read other Stack Overflow posts on similar topics but can't figure out what I'm doing wrong.

Thank you in advance for your help!

annual_mean = .06
annual_stdev = .15
start_stock_price = 100

numYears = 3
numSimulations = 4
stock_price_array = np.empty(numYears)

# draw an annual return from a normal distribution; this annual return will be random
annual_ret = np.random.normal(annual_mean, annual_stdev, numSimulations)

for i in range(numYears):
    stock_price = np.multiply(start_stock_price, (1 + annual_ret[i]))
    np.append(stock_price_array, [stock_price])
    start_stock_price = stock_price


Upvotes: 0

Views: 127

Answers (1)

chrslg
chrslg

Reputation: 13346

The 1st rule of numpy is: never iterate your array yourself. Use numpy function that does all the computation in batch (and for doing so, they iterate the array, sure. But that iteration is not a python iteration, so it is way faster).

No-for solution

For example, here, you could do something like this

np.cumprod(np.hstack([start_stock_price, annual_ret+1]))

What it does is 1st building an array of a initial value, and some factors. So if initial value is 100, and interest rate are 0.1, -0.1, 0.2, 0.2 (for example), then hstack build and array of values 100, 1.1, 0.9, 1.2, 1.2.

And the cumprod just build the cumulative product of those

100, 100×1.1=110, 100×1.1×0.9=110×0.9=99, 100×1.1×0.9×1.2=99×1.2=118.8, 100×1.1×0.9×1.2×1.2=118.8×1.2=142.56

Correction of yours

To answer to your initial question anyway (even if I strongly advise that you try to use solutions like the usage of cumprod I've shown), you have 2 choices:

  • Either you allocate in advance an array, as you did (your stock_price_array = np.empty(numYears)). And then, instead of trying to append the new stock_price to stock_price_array, you should simply fill one of the empty place that are already there. By simply doing stock_price_array[i] = stock_price

  • Or you don't. And then you replace the np.empty line by a stock_price_array=[]. And then, at each step, you do append the result to create a new stock_price_array, like this stock_price_array = np.append(stock_price_array, [stock_price])

I strongly advise against the 2nd solution. Since you already know the final size of the array, it is way better to create it once. Because np.append recreate a brand new array, then copies the input data it it. It does not just extend the existing array (generally speaking, we can't do that anyway).

But, well, anyway, I advise against both solution, since I find mine (with cumprod) preferable. for is the taboo word in numpy. And it is even more so, when what inside this for is the creation of a new array, like append is.

Monte-Carlo

Since you've mentioned Monte-Carlo, and then shown a code that compute only one result (you draw 1 set of annual ret, and perform one computation of future values), I am wondering if that is really what you want. In particular, I see that you have numSimulation and numYears, that appear to be playing redundant roles in your code (and therefore in mines). The only reason why it doesn't just throw a index error, is because numSimulation is used only to decide how many annual_ret you draw. And since numSimulation > numYears, you have more than enough annual_ret to compute the result.

Wasn't your initial intention to redo the simulation over the years numSimulation time, to have numSimulation results ?

In which case, you probably need numSimulation sets of numYears annual rate. So a 2D array. And like wise, you should be computing numSimulation series of numYears results.

If my guess is not completely off, I surmise that what you really wanted to do was rather in the effect of:

annual_ret = np.random.normal(annual_mean, annual_stdev, (numSimulations, numYears)) # 2d array of interest rate. 1 simulation per row, 1 year per column

t = np.pad(annual_ret+1, ((0,0), (1,0)), constant_values=start_stock_price) # Add 1 as we did earlier. And pad with an initial 100 (`start_stock_price`) at the beginning of each simulation

res = np.cumprod(t, axis=1) # cumulative multiplication. `axis=1` means that it is done along axis 1 (along years) for each row (for each simulation)

Upvotes: 2

Related Questions