Reputation:
I have a simple dataframe like the one mentioned below.
How to count the probability of the occurrence of one in Column_1
according to the Column_2
and Column_3
?
Column_1
is a result (either one or zero).
Column_2
Column_3
is a kind of classification.
So the first row means 1 for a person who lives in building numbers A with a car is model LM.
Column_1 Column_2 Column_3
1 A LM
1 B LO
0 C LP
1 D LM
0 A LK
1 A LM
If i understand correct the result could be
LM LO LP LK
A .33 0
B .167
C 0
D .167
Upvotes: 2
Views: 742
Reputation: 862511
You can use pivot_table
:
print (df.pivot_table(index='Column_2',
columns='Column_3',
values='Column_1',
aggfunc='sum',
fill_value=0))
Column_3 LK LM LO LP
Column_2
A 0 2 0 0
B 0 0 1 0
C 0 0 0 0
D 0 1 0 0
Another solution with groupby
and unstack
:
df1 = df.groupby(['Column_2','Column_3'])['Column_1'].sum().unstack(fill_value=0)
print (df1)
Column_3 LK LM LO LP
Column_2
A 0 2 0 0
B 0 0 1 0
C 0 0 0 0
D 0 1 0 0
Last you can divide by div
length
of index
- it is length
of df
:
print (df1.div(len(df.index)))
Column_3 LK LM LO LP
Column_2
A 0.0 0.333333 0.000000 0.0
B 0.0 0.000000 0.166667 0.0
C 0.0 0.000000 0.000000 0.0
D 0.0 0.166667 0.000000 0.0
Upvotes: 1