learner
learner

Reputation: 877

Pandas convert column with missing values with counts into distributions

I have a Pandas data frame like this:

  df = 

        A             B                  C
        A1            B1                 20
        A1            B2                 4
        A1            B3                 1
        A2            B2                 6
        A3            ...              ...
        ...           ...               ...

For a fixed set of values of B = [B1, B2, B3], I want to convert C into distribution

Desired Output =

       A              B                  C
       A1            [B1 B2 B3]          [0.8 0.16 0.04]
       A2            [B1 B2 B3]          [0.0 1.0   0.0]
       A3            [B1 B2 B3]           ...

How to handle missing values in every group for B?

Upvotes: 1

Views: 32

Answers (1)

jezrael
jezrael

Reputation: 863301

Use DataFrame.pivot, filter columns B, divide by sum and last create DataFrame by constructor:

B = ['B1', 'B2', 'B3']

df = df.pivot('A','B','C')[B].fillna(0)
df1 = df.div(df.sum(axis=1), axis=0)
df2 = pd.DataFrame({'A':df1.index, 'B':[B] * len(df1), 'C': df1.to_numpy().tolist()})
print (df2)
    A             B                  C
0  A1  [B1, B2, B3]  [0.8, 0.16, 0.04]
1  A2  [B1, B2, B3]    [0.0, 1.0, 0.0]

Upvotes: 1

Related Questions