Reputation: 4498
I have data by date and want to create a new dataframe by week with sum of sales and count of categories.
#standard packages
import numpy as np
import pandas as pd
#visualization
%matplotlib inline
import matplotlib.pylab as plt
#create weekly datetime index
edf = pd.read_csv('C:\Users\j~\raw.csv', parse_dates=[6])
edf2 = edf[['DATESENT','Sales','Category']].copy()
edf2
#output
DATESENT | SALES | CATEGORY
2014-01-04 100 A
2014-01-05 150 B
2014-01-07 150 C
2014-01-10 175 D
#create datetime index of week
edf2['DATESENT']=pd.to_datetime(edf2['DATESENT'],format='%m/%d/%Y')
edf2 = edf2.set_index(pd.DatetimeIndex(edf2['DATESENT']))
edf2.resample('w').sum()
edf2
#output
SALES CATEGORY
DATESENT
2014-01-05 250 AB
2014-01-12 325 CD
But I am looking for
SALES CATEGORY
DATESENT
2014-01-05 250 2
2014-01-12 325 2
This didn't work ...
edf2 = e2.resample('W').agg("Category":len,"Sales":np.sum)
Thank you
Upvotes: 10
Views: 18173
Reputation: 294318
using pd.TimeGrouper
+ agg
f = {'SALES': 'sum', 'CATEGORY': 'count'}
g = pd.TimeGrouper('W')
df.set_index('DATESENT').groupby(g).agg(f)
CATEGORY SALES
DATESENT
2014-01-05 2 250
2014-01-12 2 325
Upvotes: 2
Reputation: 153460
Agg takes a dictionary as arguments in various formats.
edf2 = e2.resample('W').agg({"Category":'size',"Sales":'sum'})
Upvotes: 19