Reputation: 13
I have a dataframe as follow:
dashboard = pd.DataFrame({
'id':[1,2,3,4],
'category': ['a', 'b', 'a', 'c'],
'price': [123, 151, 21, 24],
'description': ['IT related', 'IT related', 'Marketing','']
})
I need to add a row to show both sum and count only for some categories as follow:
pd.DataFrame({
'id': [3],
'category': ['a&b'],
'price': [295],
'description': ['']
})
Upvotes: 0
Views: 848
Reputation: 712
dashboard = pd.DataFrame({
'id':[1,2,3,4],
'category': ['a', 'b', 'a', 'c'],
'price': [123, 151, 21, 24],
'description': ['IT related', 'IT related', 'Marketing','']
})
selection =['a','b']
selection_row = '&'.join(selection)
df2 = dashboard[dashboard['category'].isin(selection)].agg({'id' : ['count'], 'price' : ['sum']}).fillna(0).T
df2['summary'] = df2['count'].add(df2['sum'])
df2.loc['description'] =np.nan
df2.loc['category'] = selection_row
final_df = df2['summary']
final_df
id 3
price 295
description NaN
category a&b
Name: summary, dtype: object
Upvotes: 0
Reputation: 4011
An option using .agg
:
dashboard = pd.DataFrame({
'id': [1, 2, 3, 4],
'category': ['a', 'b', 'a', 'c'],
'price': [123, 151, 21, 24],
'description': ['IT related', 'IT related', 'Marketing', '']
})
a_b = dashboard[dashboard['category'].isin(['a','b'])].agg({'id':'count', 'price':sum})
df = pd.DataFrame({'a&b':a_b})
yields
a&b
id 3
price 295
which you could then .transpose()
and merge into your existing dataframe if desired, or compile a separate dataframe of summary results, etc.
Upvotes: 1
Reputation: 13403
I pre-calculate all the sums for each category, then for each pair we add the sums, and the category names, and append the new row.
try this:
import pandas as pd
dashboard = pd.DataFrame({
'id': [1, 2, 3, 4],
'category': ['a', 'b', 'a', 'c'],
'price': [123, 151, 21, 24],
'description': ['IT related', 'IT related', 'Marketing', '']
})
pairs = [('a', 'b')]
groups = dashboard.groupby("category")['price'].sum()
for c1, c2 in pairs:
new_id = sum((dashboard['category'] == c1) | (dashboard['category'] == c2))
name = '{}&{}'.format(c1, c2)
price_sum = groups[c1] + groups[c2]
dashboard = dashboard.append(pd.DataFrame({'id': [new_id], 'category': [name], 'price': [price_sum], 'description': ['']}))
print(dashboard)
Upvotes: 0