Reputation: 877
I have a Pandas data frame like this:
df =
A B C
A1 B1 20
A1 B2 4
A1 B3 1
A2 B2 6
A3 ... ...
... ... ...
For a fixed set of values of B = [B1, B2, B3], I want to convert C into distribution
Desired Output =
A B C
A1 [B1 B2 B3] [0.8 0.16 0.04]
A2 [B1 B2 B3] [0.0 1.0 0.0]
A3 [B1 B2 B3] ...
How to handle missing values in every group for B?
Upvotes: 1
Views: 32
Reputation: 863301
Use DataFrame.pivot
, filter columns B
, divide by sum
and last create DataFrame
by constructor:
B = ['B1', 'B2', 'B3']
df = df.pivot('A','B','C')[B].fillna(0)
df1 = df.div(df.sum(axis=1), axis=0)
df2 = pd.DataFrame({'A':df1.index, 'B':[B] * len(df1), 'C': df1.to_numpy().tolist()})
print (df2)
A B C
0 A1 [B1, B2, B3] [0.8, 0.16, 0.04]
1 A2 [B1, B2, B3] [0.0, 1.0, 0.0]
Upvotes: 1