Igor Rivin
Igor Rivin

Reputation: 4864

combining multiple columns pandas groupby

suppose I have a stock, and I have a minute-by-minte dataframe of prices and volumes:

close   high    low open    symbol  timestamp   tradingDay  volume
0   116.02  116.02  116.02  116.02  AAPL    2017-01-03 04:03:00-05:00   
2017-01-03  200
1   116.14  116.14  116.14  116.14  AAPL    2017-01-03 04:06:00-05:00   
2017-01-03  100
2   116.15  116.24  116.12  116.14  AAPL    2017-01-03 04:07:00-05:00   
2017-01-03  1370
3   116.13  116.15  116.13  116.15  AAPL    2017-01-03 04:08:00-05:00   
2017-01-03  1100
4   116.15  116.15  116.15  116.15  AAPL    2017-01-03 04:09:00-05:00   
2017-01-03  100

(sorry about the formatting)

I want to combine these into "vwap" (volume weighed average price), and this seems easy:

zoo= foo.groupby('tradingDay').agg( lambda x: x['close'].dot(x['volume'])/np.sum(x.volume))

But the resulting object has as many columns as the original dataframe (each column being the same and equal to the desired vwap price). Of course, one can then select a random column, but that seems morally impure and memory-wasteful. I am presumably missing something very obvious, but what?

Upvotes: 1

Views: 1229

Answers (1)

BENY
BENY

Reputation: 323396

IIUC

df.groupby('tradingDay').apply(lambda x: sum(x['close'].mul(x['volume']))/np.sum(x['volume']))
Out[323]: 
tradingDay
2017-01-03    116.132927
dtype: float64

Upvotes: 2

Related Questions