How to group by number of bins a ordered dataframe?

Question

I have a dataframe like that:

year	count_yes	count_no
1900	5	7
1903	5	3
1915	14	6
1919	6	14

I want to have two bins, independently of the value itself.

How can I group those categories and sum its values?

Expected result:

year	count_yes	count_no
1900	10	10
1910	20	20

Logic: Grouped the first two rows (1900 and 1903) and the two last rows (1915 and 1919) and summed the values of each category

I want to create a stacked percentage column graphic, so 1900 would be 50/50% and 1910 would be also 50/50%.

I've already created the function to build this graphic, I just need to adjust the dataframe size into bins to create a better distribution and visualization

mozway · Accepted Answer

You can bin the years with pandas.cut and aggregate with groupby+sum:

bins = list(range(1900, df['year'].max()+10, 10))
group = pd.cut(df['year'], bins=bins, labels=bins[:-1], right=False)

df.drop('year', axis=1).groupby(group).sum().reset_index()

If you only want to specify the number of bins, compute group with:

group = pd.cut(df['year'], bins=2, right=False)

output:

   year  count_yes  count_no
0  1900         10        10
1  1910         20        20

How to group by number of bins a ordered dataframe?

Answers (2)

Related Questions