Reputation: 55
I am trying to sum a column of values based upon the binning of another column.
df = pd.read_csv("data_file.csv", na_values=' ')
hincp = df.HINCP.dropna()
grouping = pd.qcut(hincp, 3, labels=["low", "medium", "high"])
I want to sum the values of another column df['WGTP']
based on the "low", "medium", and "high" bins and display the count for each bin. Like so:
WGTP_count
low 12300
medium 3455
high 65500
(Hypothetical numbers) Would I use groupby? Value_counts? I really have no idea how to go about this.
Upvotes: 1
Views: 622
Reputation: 13334
I made the code more generic so it can serve others, I'll let you adapt the column names to match your example:
to_bins
is the column on which you create the quantiles with .qcut
to_sum
are the values you want to sum up based on your quantilesInitial df:
import pandas as pd
cols = ['to_bins', 'to_sums']
df = pd.DataFrame([[1, 10], [2, 20], [3, 30], [1, 10],[2, 20], [3, 30]], columns=cols)
Output:
to_bins to_sums
0 1 10
1 2 20
2 3 30
3 1 10
4 2 20
5 3 30
Creating quantiles with qcut
and assigning to bins
column:
df['bins'] = pd.qcut(df.to_bins, 3, labels=['low', 'medium', 'high']).values
Output:
to_bins to_sums bins
0 1 10 low
1 2 20 medium
2 3 30 high
3 1 10 low
4 2 20 medium
5 3 30 high
Summing up values as per those bins (.reset_index()
is used to convert the output of a groupby back into a dataframe you can further manipulate if you want):
df.groupby(['bins'])['to_sums'].sum().reset_index()
Output:
bins to_sums
0 low 20
1 medium 40
2 high 60
Upvotes: 1