Reputation: 175
I am running a simple python script for MC. Basically it reads through every row in the dataframe and selects the max and min of the two variables. Then the simulation if run 1000 times selecting a random value between the min and max and computes the product and writes the P50 value back to the datatable.
Somehow the P50 output is the same for all rows. Any help on where I am going wrong?
import pandas as pd
import random
import numpy as np
data = [[0.075,0.085, 120, 150], [0.055, 0.075, 150, 350],[0.045,0.055,175,400]]
df = pd.DataFrame(data, columns = ['P_min','P_max','H_min','H_max'])
NumSim = 1000
for index, row in df.iterrows():
outdata = np.zeros(shape=(NumSim,), dtype=float)
for k in range(NumSim):
phi = (row['P_min'] + (row['P_max'] - row['P_min']) * random.uniform(0, 1))
ht = (row['H_min'] + (row['H_max'] - row['H_min']) * random.uniform(0, 1))
outdata[k] = phi*ht
df['out_p50'] = np.percentile(outdata,50)
print(df)
Upvotes: 0
Views: 405
Reputation: 77827
Yup -- you're writing a scalar value to the entire column. You overwrite that value on each iteration. If you want, you can simply specify the row with df.loc
for a quick fix. Also consider using outdata.median
instead of percentile
.
Perhaps the most important feature of PANDAS is the built-in support for vectorization: you work with entire columns of data, rather than looping through the data frame. Think like a list comprehension in which you don't need the for row in df
iteration at the end.
Upvotes: 0
Reputation: 333
By df['out_p50'] = np.percentile(outdata,50)
you are saying that you want the whole column to be set to given value, not a specific row of the column. Therefore, the numbers are generated and saved but they are saved to the whole column and in the end, you see the last generated number in every row.
Instead, use df.loc[index, 'out_p50'] = np.percentile(outdata,50)
to specify the specific row you want to set.
Upvotes: 1