taylor
taylor

Reputation: 55

Display sum of values for qcut() bin label

I am trying to sum a column of values based upon the binning of another column.

df = pd.read_csv("data_file.csv",  na_values=' ')
hincp = df.HINCP.dropna()
grouping = pd.qcut(hincp, 3, labels=["low", "medium", "high"])

I want to sum the values of another column df['WGTP'] based on the "low", "medium", and "high" bins and display the count for each bin. Like so:

         WGTP_count
low           12300
medium         3455
high          65500

(Hypothetical numbers) Would I use groupby? Value_counts? I really have no idea how to go about this.

Upvotes: 1

Views: 622

Answers (1)

Max
Max

Reputation: 13334

I made the code more generic so it can serve others, I'll let you adapt the column names to match your example:

  • to_bins is the column on which you create the quantiles with .qcut
  • to_sum are the values you want to sum up based on your quantiles

Initial df:

import pandas as pd

cols = ['to_bins', 'to_sums']
df = pd.DataFrame([[1, 10], [2, 20], [3, 30], [1, 10],[2, 20], [3, 30]], columns=cols)

Output:

   to_bins to_sums
0       1      10
1       2      20
2       3      30
3       1      10
4       2      20
5       3      30

Creating quantiles with qcut and assigning to bins column:

df['bins'] = pd.qcut(df.to_bins, 3, labels=['low', 'medium', 'high']).values

Output:

   to_bins  to_sums    bins
0        1       10     low
1        2       20  medium
2        3       30    high
3        1       10     low
4        2       20  medium
5        3       30    high

Summing up values as per those bins (.reset_index() is used to convert the output of a groupby back into a dataframe you can further manipulate if you want):

df.groupby(['bins'])['to_sums'].sum().reset_index()

Output:

     bins  to_sums
0     low      20
1  medium      40
2    high      60

Upvotes: 1

Related Questions