Dark Knight
Dark Knight

Reputation: 175

Monte carlo simulation in python - problem with looping

I am running a simple python script for MC. Basically it reads through every row in the dataframe and selects the max and min of the two variables. Then the simulation if run 1000 times selecting a random value between the min and max and computes the product and writes the P50 value back to the datatable.

Somehow the P50 output is the same for all rows. Any help on where I am going wrong?

import pandas as pd
import random
import numpy as np

data = [[0.075,0.085, 120, 150], [0.055, 0.075, 150, 350],[0.045,0.055,175,400]]
df = pd.DataFrame(data, columns = ['P_min','P_max','H_min','H_max'])
NumSim = 1000

for index, row in df.iterrows():
    outdata = np.zeros(shape=(NumSim,), dtype=float)
    for k in range(NumSim):
        phi = (row['P_min'] + (row['P_max'] - row['P_min']) * random.uniform(0, 1))
        ht = (row['H_min'] + (row['H_max'] - row['H_min']) * random.uniform(0, 1))
        outdata[k] = phi*ht
    df['out_p50'] = np.percentile(outdata,50)

print(df)

Upvotes: 0

Views: 405

Answers (2)

Prune
Prune

Reputation: 77827

Yup -- you're writing a scalar value to the entire column. You overwrite that value on each iteration. If you want, you can simply specify the row with df.loc for a quick fix. Also consider using outdata.median instead of percentile.

Perhaps the most important feature of PANDAS is the built-in support for vectorization: you work with entire columns of data, rather than looping through the data frame. Think like a list comprehension in which you don't need the for row in df iteration at the end.

Upvotes: 0

hmhmmm
hmhmmm

Reputation: 333

By df['out_p50'] = np.percentile(outdata,50) you are saying that you want the whole column to be set to given value, not a specific row of the column. Therefore, the numbers are generated and saved but they are saved to the whole column and in the end, you see the last generated number in every row.

Instead, use df.loc[index, 'out_p50'] = np.percentile(outdata,50) to specify the specific row you want to set.

Upvotes: 1

Related Questions