Group by aggregation for missing values in range

Question

I have a pandas dataframe test whose values I would like to turn into percentiles of all integers in categories, for example:

import pandas as pd

categories = [0,1,2,3,4,5,6,7,8,9,10]


test  
id  value
foo  0
foo  0
foo  1
foo  1
foo  5
foo  4
foo  4
foo  4
foo  3
foo  3
bar  2
bar  2
bar  2
bar  2
bar  2
bar  6
bar  6
bar  6
bar  6
bar  6

The problem I am having is mapping 0 percentiles to all possible integers in categories. I when I try

test.groupby('id')['value'].apply(lambda x: x.value_counts(normalize=True)).unstack().fillna(0)

The following dataframe is returned, but it is missing values 7, 8, 9, 10, etc. because they are not contained in each id:

    0   1   2   3   4   5   6
id                          
bar 0.0 0.0 0.5 0.0 0.0 0.0 0.5
foo 0.2 0.2 0.0 0.2 0.3 0.1 0.0

Is there a efficient way of adding all values of catgories into the value_count aggregation function so that the following result is returned?

    0   1   2   3   4   5   6   7   8   9  10
foo 0.2 0.2 0.0 0.2 0.3 0.1 0.5 0   0   0   0
bar 0.0 0.0 0.5 0.0 0.0 0.0 0.0 0   0   0   0

Quang Hoang · Accepted Answer

Try with, not necessarily more efficient but more readable:

(pd.crosstab(df['id'], df['value'])
   .reindex(categories, axis=1, fill_value=0)
)

Group by aggregation for missing values in range

Answers (2)

`Categorical`

Over engineered

Related Questions