user15155674
user15155674

Reputation:

Advanced using of for loop

I am confused about using advanced for loop and trying to make the flow clear.

For example, I have the main data frame like this:

   A  B  C  D  ... year
0  0  1  1  2  ... 1991
1  0  0  0  1  ... 1993
2  1  0  0  0  ... 1994
3  0  1  1  0  ... 1995

I already had a table of percentage containing each element like this:

  index   value
0   A     0.002
1   B     0.012
2   C     0.035
3   D     0.005
...

I want to calculate conditional probabilities for each of the keys like this:

  key1  key2  year  prob
0  A     B    1991  0.135
1  A     C    1993  0.500
2  A     B    1994  0.354
3  A     A    1991  1.000

I am confused about the year column and the elements. How to use for loop to extract elements from columns of the main data frame?

There's another way I am thinking about, but don't know how to start it. I get key1 and key2 from the percentage table then create a range for the year (range(1983, …), then get the prob from conditional probabilities function.

def condprobability(frame, column1, column2, year):
    for i in range(1991,1992,1993,1994,1995):
        

I'm stuck here. May I ask for some hints or resources about it?

Upvotes: 0

Views: 159

Answers (2)

Ynjxsjmh
Ynjxsjmh

Reputation: 29992

You can pandas.DataFrame.iterrows() on each row and calculate conditional probability between each column。

percent_df.set_index('index', inplace=True)

columns_to_cal_cond_prob = ['A', 'B', 'C', 'D']
cond_probs = []

for index, row in main_df.iterrows():
    for col1 in columns_to_cal_cond_prob:
        for col2 in columns_to_cal_cond_prob:
            value1 = main_df.loc[index, col1]
            value2 = main_df.loc[index, col2]

            # Implement your conditional probabilities calculations here
            cond_prob = percent_df.loc[col1, 'value'] + percent_df.loc[col2, 'value']

            cond_probs.append([col1, col2, main_df.loc[index, 'year'], cond_prob])

cond_prob_df = pd.DataFrame(cond_probs, columns=['key1', 'key2', 'year', 'prob'])

Upvotes: 0

bfvtv vxyfbd
bfvtv vxyfbd

Reputation: 57

if you are using pandas start with loc and iloc.

Upvotes: 2

Related Questions