jnc
jnc

Reputation: 57

pandas: groupby multiple columns. How to get all combinations?

I have the following dataframe describing persons, where age_range has been computed from the age column

   age  gender group    age_range
0   46  F      1       >= 30 and < 60
1   50  F      1       >= 30 and < 60
2   63  F      2       >= 60
3   65  F      2       >= 60
4   34  F      1       >= 30 and < 60
5   42  F      2       >= 30 and < 60
6   55  F      1       >= 30 and < 60
7   57  M      1       >= 30 and < 60

From there, I would like a table giving for each group, all age_ranges and the number of persons from the group and the age_range, even for empty bins (no person < 30 in any of the two groups and all members of group 1 are in >= 30 and < 60).

How could I get something like the following?

    group   age_range       count
0   1       < 30            0
1   1       >= 30 and < 60  5
2   1       >= 60           0
3   2       < 30            0
4   2       >= 30 and < 60  1
5   2       >= 60           2

I tried multiple groupby options but never managed to get a complete table.

Upvotes: 4

Views: 5195

Answers (1)

Ken Wei
Ken Wei

Reputation: 3130

To get the non-zero counts:

g = df.groupby(['group','age_range']).size().to_frame('count').reset_index()

Then, you'll want a dummy table of all the group and age range combinations:

from itertools import product
combs = pd.DataFrame(list(product(df['group'], df['age_range'])), 
                     columns=['group', 'age_range'])

Merge (SQL join) g with this, and replace the NaNs with zeros:

result = g.merge(combs, how = 'right').fillna(0)

A one-liner solution:

df.groupby(['group','age_range']).size().to_frame('count').reindex(MultiIndex.from_product([df['group'].unique(), df['age_range'].unique()], fill_value = 0))

Upvotes: 3

Related Questions