silent_hunter
silent_hunter

Reputation: 2508

Setting title after aggregation

I am trying to aggregate and count values together. Below you can see my dataset

data = {'id':['1','2','3','4','5'],
        'name': ['Company1', 'Company1', 'Company3', 'Company3', 'Company5'], 
        'sales': [10, 3, 5, 1, 0], 
        'income': [10, 3, 5, 1, 0], 
       }
df = pd.DataFrame(data, columns = ['id','name', 'sales','income'])

conditions = [
    (df['sales'] < 1),
    (df['sales'] >= 1) & (df['sales'] < 3),
    (df['sales'] >= 3) & (df['sales'] < 5),
    (df['sales'] >= 5)
    ]


values = ['<1', '1-3', '3-5', '>= 5']

df['range'] = np.select(conditions, values)

df=df.groupby('range')['sales','income'].agg(['count','sum']).reset_index()   

This code gives me the next table

enter image description here

But I am not satisfied with the appearance of this table because 'count' is duplicated two times. So can anybody help me with this table in order to have separate columns 'range', 'count', 'income' and 'sales'.

Upvotes: 0

Views: 98

Answers (1)

user7864386
user7864386

Reputation:

You could try named aggregation:

df.groupby('range', as_index=False).agg(count=('range','count'), sales=('sales','sum'), income=('income','sum'))

Output:

  range  count  sales  income
0   1-3      1      1       1
1   3-5      1      3       3
2    <1      1      0       0
3  >= 5      2     15      15

P.S. You probably want to make "range" a categorical variable, so that the output is sorted in the correct order:

df['range'] = pd.Categorical(np.select(conditions, values), categories=values, ordered=True)

Then the above code outputs:

  range  count  sales  income
0    <1      1      0       0
1   1-3      1      1       1
2   3-5      1      3       3
3  >= 5      2     15      15

Upvotes: 1

Related Questions