Reputation: 57
I have the following dataframe describing persons, where age_range has been computed from the age column
age gender group age_range
0 46 F 1 >= 30 and < 60
1 50 F 1 >= 30 and < 60
2 63 F 2 >= 60
3 65 F 2 >= 60
4 34 F 1 >= 30 and < 60
5 42 F 2 >= 30 and < 60
6 55 F 1 >= 30 and < 60
7 57 M 1 >= 30 and < 60
From there, I would like a table giving for each group, all age_ranges and the number of persons from the group and the age_range, even for empty bins (no person < 30 in any of the two groups and all members of group 1 are in >= 30 and < 60).
How could I get something like the following?
group age_range count
0 1 < 30 0
1 1 >= 30 and < 60 5
2 1 >= 60 0
3 2 < 30 0
4 2 >= 30 and < 60 1
5 2 >= 60 2
I tried multiple groupby options but never managed to get a complete table.
Upvotes: 4
Views: 5195
Reputation: 3130
To get the non-zero counts:
g = df.groupby(['group','age_range']).size().to_frame('count').reset_index()
Then, you'll want a dummy table of all the group and age range combinations:
from itertools import product
combs = pd.DataFrame(list(product(df['group'], df['age_range'])),
columns=['group', 'age_range'])
Merge (SQL join) g
with this, and replace the NaNs with zeros:
result = g.merge(combs, how = 'right').fillna(0)
A one-liner solution:
df.groupby(['group','age_range']).size().to_frame('count').reindex(MultiIndex.from_product([df['group'].unique(), df['age_range'].unique()], fill_value = 0))
Upvotes: 3