O K
O K

Reputation: 13

Add a Total and Count Row to a Dataframe

I have a dataframe as follow:

dashboard = pd.DataFrame({
 'id':[1,2,3,4],
 'category': ['a', 'b', 'a', 'c'],
 'price': [123, 151, 21, 24],
 'description': ['IT related', 'IT related', 'Marketing','']
})

I need to add a row to show both sum and count only for some categories as follow:

pd.DataFrame({
 'id': [3],
 'category': ['a&b'],
 'price': [295],
 'description': ['']
})

Upvotes: 0

Views: 848

Answers (3)

Gustavo Gradvohl
Gustavo Gradvohl

Reputation: 712

Try this:

enter image description here

Code

dashboard = pd.DataFrame({
 'id':[1,2,3,4],
 'category': ['a', 'b', 'a', 'c'],
 'price': [123, 151, 21, 24],
 'description': ['IT related', 'IT related', 'Marketing','']
})

selection =['a','b']
selection_row = '&'.join(selection)
df2 = dashboard[dashboard['category'].isin(selection)].agg({'id' : ['count'], 'price' : ['sum']}).fillna(0).T
df2['summary'] = df2['count'].add(df2['sum'])

df2.loc['description'] =np.nan
df2.loc['category'] = selection_row

final_df = df2['summary']

final_df

id               3
price          295
description    NaN
category       a&b
Name: summary, dtype: object

Upvotes: 0

Brendan
Brendan

Reputation: 4011

An option using .agg:

dashboard = pd.DataFrame({
    'id': [1, 2, 3, 4],
    'category': ['a', 'b', 'a', 'c'],
    'price': [123, 151, 21, 24],
    'description': ['IT related', 'IT related', 'Marketing', '']
})
a_b = dashboard[dashboard['category'].isin(['a','b'])].agg({'id':'count', 'price':sum})
df = pd.DataFrame({'a&b':a_b})

yields

       a&b
id       3
price  295

which you could then .transpose() and merge into your existing dataframe if desired, or compile a separate dataframe of summary results, etc.

Upvotes: 1

Adam.Er8
Adam.Er8

Reputation: 13403

I pre-calculate all the sums for each category, then for each pair we add the sums, and the category names, and append the new row.

try this:

import pandas as pd

dashboard = pd.DataFrame({
    'id': [1, 2, 3, 4],
    'category': ['a', 'b', 'a', 'c'],
    'price': [123, 151, 21, 24],
    'description': ['IT related', 'IT related', 'Marketing', '']
})

pairs = [('a', 'b')]

groups = dashboard.groupby("category")['price'].sum()

for c1, c2 in pairs:
    new_id = sum((dashboard['category'] == c1) | (dashboard['category'] == c2))
    name = '{}&{}'.format(c1, c2)
    price_sum = groups[c1] + groups[c2]
    dashboard = dashboard.append(pd.DataFrame({'id': [new_id], 'category': [name], 'price': [price_sum], 'description': ['']}))

print(dashboard)

Upvotes: 0

Related Questions