Reputation: 35
I would like to take a dataframe with MultiIndex columns (where the index is a DatetimeIndex), and the aggregate by different functions depending on the column.
For example, consider the following table where index includes dates, first level of columns are Price and Volume, and second level of columns are tickers (e.g. AAPL and AMZN).
df1 = pd.DataFrame({"ticker":["AAPL"]*365,
'date': pd.date_range(start='20170101', end='20171231'),
'volume' : [np.random.randint(50,100) for i in range(365)],
'price': [np.random.randint(100,200) for i in range(365)]})
df2 = pd.DataFrame({"ticker":["AMZN"]*365,
'date': pd.date_range(start='20170101', end='20171231'),
'volume' : [np.random.randint(50,100) for i in range(365)],
'price': [np.random.randint(100,200) for i in range(365)]})
df = pd.concat([df1,df2])
grp = df.groupby(['date', 'ticker']).mean().unstack()
grp.head()
What I would like to do is to aggregate the data by month, but taking the mean of price and sum of volume.
I would have thought that something along the lines of grp.resample("MS").agg({"price":"mean", "volume":"sum"})
should work, but it does not because of the multi-index column. What's the best way to accomplish this?
Upvotes: 1
Views: 668
Reputation: 323246
You can
df.groupby([pd.to_datetime(df.date).dt.strftime('%Y-%m'),df.ticker]).\
agg({"price":"mean", "volume":"sum"}).unstack()
Out[529]:
price volume
ticker AAPL AMZN AAPL AMZN
date
2017-01 155.548387 141.580645 2334 2418
2017-02 154.035714 156.821429 2112 2058
2017-03 154.709677 148.806452 2258 2188
2017-04 154.366667 149.366667 2271 2254
2017-05 154.774194 155.096774 2331 2264
2017-06 147.333333 145.133333 2220 2302
2017-07 149.709677 150.645161 2188 2412
2017-08 150.806452 154.645161 2265 2341
2017-09 157.033333 151.466667 2199 2232
2017-10 149.387097 145.580645 2303 2203
2017-11 154.100000 150.266667 2212 2275
2017-12 156.064516 149.290323 2265 2224
Upvotes: 2