Probability in pandas

Question

I have a simple dataframe like the one mentioned below.

How to count the probability of the occurrence of one in Column_1 according to the Column_2 and Column_3 ?

Column_1 is a result (either one or zero).

Column_2 Column_3 is a kind of classification.

So the first row means 1 for a person who lives in building numbers A with a car is model LM.

Column_1 Column_2 Column_3
 1        A         LM
 1        B         LO    
 0        C         LP
 1        D         LM
 0        A         LK
 1        A         LM

If i understand correct the result could be

    LM    LO    LP    LK
A  .33                0
B              .167
C               0
D  .167

jezrael · Accepted Answer

You can use pivot_table:

print (df.pivot_table(index='Column_2', 
                      columns='Column_3', 
                      values='Column_1', 
                      aggfunc='sum', 
                      fill_value=0))
Column_3  LK  LM  LO  LP
Column_2                
A          0   2   0   0
B          0   0   1   0
C          0   0   0   0
D          0   1   0   0

Another solution with groupby and unstack:

df1 = df.groupby(['Column_2','Column_3'])['Column_1'].sum().unstack(fill_value=0)
print (df1)
Column_3  LK  LM  LO  LP
Column_2                
A          0   2   0   0
B          0   0   1   0
C          0   0   0   0
D          0   1   0   0

Last you can divide by div length of index - it is length of df:

print (df1.div(len(df.index)))
Column_3   LK        LM        LO   LP
Column_2                              
A         0.0  0.333333  0.000000  0.0
B         0.0  0.000000  0.166667  0.0
C         0.0  0.000000  0.000000  0.0
D         0.0  0.166667  0.000000  0.0

Probability in pandas

Answers (1)

Related Questions