Reputation:
I am confused about using advanced for loop and trying to make the flow clear.
For example, I have the main data frame like this:
A B C D ... year
0 0 1 1 2 ... 1991
1 0 0 0 1 ... 1993
2 1 0 0 0 ... 1994
3 0 1 1 0 ... 1995
I already had a table of percentage containing each element like this:
index value
0 A 0.002
1 B 0.012
2 C 0.035
3 D 0.005
...
I want to calculate conditional probabilities for each of the keys like this:
key1 key2 year prob
0 A B 1991 0.135
1 A C 1993 0.500
2 A B 1994 0.354
3 A A 1991 1.000
I am confused about the year column and the elements. How to use for loop to extract elements from columns of the main data frame?
There's another way I am thinking about, but don't know how to start it. I get key1 and key2 from the percentage table then create a range for the year (range(1983, …), then get the prob from conditional probabilities function.
def condprobability(frame, column1, column2, year):
for i in range(1991,1992,1993,1994,1995):
I'm stuck here. May I ask for some hints or resources about it?
Upvotes: 0
Views: 159
Reputation: 29992
You can pandas.DataFrame.iterrows() on each row and calculate conditional probability between each column。
percent_df.set_index('index', inplace=True)
columns_to_cal_cond_prob = ['A', 'B', 'C', 'D']
cond_probs = []
for index, row in main_df.iterrows():
for col1 in columns_to_cal_cond_prob:
for col2 in columns_to_cal_cond_prob:
value1 = main_df.loc[index, col1]
value2 = main_df.loc[index, col2]
# Implement your conditional probabilities calculations here
cond_prob = percent_df.loc[col1, 'value'] + percent_df.loc[col2, 'value']
cond_probs.append([col1, col2, main_df.loc[index, 'year'], cond_prob])
cond_prob_df = pd.DataFrame(cond_probs, columns=['key1', 'key2', 'year', 'prob'])
Upvotes: 0