Reputation: 141
I have a dataframe like that:
year | count_yes | count_no |
---|---|---|
1900 | 5 | 7 |
1903 | 5 | 3 |
1915 | 14 | 6 |
1919 | 6 | 14 |
I want to have two bins, independently of the value itself.
How can I group those categories and sum its values?
Expected result:
year | count_yes | count_no |
---|---|---|
1900 | 10 | 10 |
1910 | 20 | 20 |
Logic: Grouped the first two rows (1900 and 1903) and the two last rows (1915 and 1919) and summed the values of each category
I want to create a stacked percentage column graphic, so 1900 would be 50/50% and 1910 would be also 50/50%.
I've already created the function to build this graphic, I just need to adjust the dataframe size into bins to create a better distribution and visualization
Upvotes: 0
Views: 69
Reputation: 404
This is a way to do what you need, if you are ok using the decades as index:
df['year'] = (df.year//10)*10
df_group = df.groupby('year').sum()
Output>>>
df_group
count_yes count_no
year
1900 10 10
1910 20 20
Upvotes: 2
Reputation: 260470
You can bin the years with pandas.cut
and aggregate with groupby
+sum
:
bins = list(range(1900, df['year'].max()+10, 10))
group = pd.cut(df['year'], bins=bins, labels=bins[:-1], right=False)
df.drop('year', axis=1).groupby(group).sum().reset_index()
If you only want to specify the number of bins, compute group
with:
group = pd.cut(df['year'], bins=2, right=False)
output:
year count_yes count_no
0 1900 10 10
1 1910 20 20
Upvotes: 1