Reputation: 110
My dataframe has a daily price column and a window size column :
df = pd.DataFrame(columns = ['price', 'window'],
data = [[100, 1],[120, 2], [115, 2], [116, 2], [100, 4]])
df
price window
0 100 1
1 120 2
2 115 2
3 116 2
4 100 4
I would like to compute the rolling mean of price for each row using the window of the window column.
The result would be this :
df
price window rolling_mean_price
0 100 1 100.00
1 120 2 110.00
2 115 2 117.50
3 116 2 115.50
4 100 4 112.75
I don't find any elegant way to do it with apply and I refuse to loop over each row of my DataFrame...
Upvotes: 1
Views: 147
Reputation: 61910
The best solutions, in terms of raw speed and complexity, are based on ideas from summed-area table. The problem can be consider as a table of one dimension. Below you can find several approaches, ranked from best to worst.
Numpy + Linear complexity
size = len(df['price'])
price = np.zeros(size + 1)
price[1:] = df['price'].values.cumsum()
window = np.clip(np.arange(size) - (df['window'].values - 1), 0, None)
df['rolling_mean_price'] = (price[1:] - price[window]) / df['window'].values
print(df)
Output
price window rolling_mean_price
0 100 1 100.00
1 120 2 110.00
2 115 2 117.50
3 116 2 115.50
4 100 4 112.75
Loopy + Linear complexity
price = df['price'].values.cumsum()
df['rolling_mean_price'] = [(price[i] - float((i - w) > -1) * price[i-w]) / w for i, w in enumerate(df['window'])]
Loopy + Quadratic complexity
price = df['price'].values
df['rolling_mean_price'] = [price[i - (w - 1):i + 1].mean() for i, w in enumerate(df['window'])]
Upvotes: 2
Reputation: 7625
I would not recommend this approach using pandas.DataFrame.apply()
(reasons described here), but if you insist on it, here is one solution:
df['rolling_mean_price'] = df.apply(
lambda row: df.rolling(row.window).price.mean().iloc[row.name], axis=1)
The output looks like this:
>>> print(df)
price window rolling_mean_price
0 100 1 100.00
1 120 2 110.00
2 115 2 117.50
3 116 2 115.50
4 100 4 112.75
Upvotes: 1