Reputation: 4864
suppose I have a stock, and I have a minute-by-minte dataframe of prices and volumes:
close high low open symbol timestamp tradingDay volume
0 116.02 116.02 116.02 116.02 AAPL 2017-01-03 04:03:00-05:00
2017-01-03 200
1 116.14 116.14 116.14 116.14 AAPL 2017-01-03 04:06:00-05:00
2017-01-03 100
2 116.15 116.24 116.12 116.14 AAPL 2017-01-03 04:07:00-05:00
2017-01-03 1370
3 116.13 116.15 116.13 116.15 AAPL 2017-01-03 04:08:00-05:00
2017-01-03 1100
4 116.15 116.15 116.15 116.15 AAPL 2017-01-03 04:09:00-05:00
2017-01-03 100
(sorry about the formatting)
I want to combine these into "vwap" (volume weighed average price), and this seems easy:
zoo= foo.groupby('tradingDay').agg( lambda x: x['close'].dot(x['volume'])/np.sum(x.volume))
But the resulting object has as many columns as the original dataframe (each column being the same and equal to the desired vwap price). Of course, one can then select a random column, but that seems morally impure and memory-wasteful. I am presumably missing something very obvious, but what?
Upvotes: 1
Views: 1229
Reputation: 323396
IIUC
df.groupby('tradingDay').apply(lambda x: sum(x['close'].mul(x['volume']))/np.sum(x['volume']))
Out[323]:
tradingDay
2017-01-03 116.132927
dtype: float64
Upvotes: 2