harry04
harry04

Reputation: 962

Aggregating column values of dataframe to a new dataframe

I have a dataframe which involves Vendor, Product, Price of various listings on a market among other column values.

Dataframe

I need a dataframe which has the unique vendors, number of products, sum of their product listings, average price/product and (average * no. of sales) as different columns.

Something like this -

Result

What's the best way to make this new dataframe?

Thanks!

Upvotes: 2

Views: 876

Answers (3)

Rahul Chawla
Rahul Chawla

Reputation: 1078

You can do this by using pandas pivot_table. Here is an example based on your data.

import pandas as pd
import numpy as np

>>> f = pd.pivot_table(d, index=['Vendor', 'Sales'], values=['Price', 'Product'], aggfunc={'Price': np.sum, 'Product':np.ma.count}).reset_index()

>>> f['Avg Price/Product'] = f['Price']/f['Product']

>>> f['H Factor'] = f['Sales']*f['Avg Price/Product']

>>> f.drop('Sales', axis=1)

  Vendor  Price  Product  Avg Price/Product  H Factor
0      A    121        4              30.25    6050.0
1      B     12        1              12.00    1440.0
2      C     47        2              23.50     587.5
3      H     45        1              45.00    9000.0

Upvotes: 0

jezrael
jezrael

Reputation: 862661

First multiple columns Number of Sales with Price, then use DataFrameGroupBy.agg by dictionary of columns names with aggregate functions, then flatten MultiIndex in columns by map and rename. :

df['Number of Sales'] *=  df['Price']

d1 = {'Product':'size', 'Price':['sum', 'mean'], 'Number of Sales':'mean'}
df = df.groupby('Vendor').agg(d1)
df.columns = df.columns.map('_'.join)
d = {'Product_size':'No. of Product',
     'Price_sum':'Sum of Prices',
     'Price_mean':'Mean of Prices',
     'Number of Sales_mean':'H Factor'
     }
df = df.rename(columns=d).reset_index()
print (df)
  Vendor  No. of Product  Sum of Prices  Mean of Prices  H Factor
0      A               4            121           30.25    6050.0
1      B               1             12           12.00    1440.0
2      C               2             47           23.50     587.5
3      H               1             45           45.00    9000.0

Upvotes: 2

John Zwinck
John Zwinck

Reputation: 249153

You can do it using groupby(), like this:

df.groupby('Vendor').agg({'Products': 'count', 'Price': ['sum', 'mean']})

That's just three columns, but you can work out the rest.

Upvotes: 1

Related Questions