Reputation: 21
I have some data in a Pandas Dataframe. I'd like to calculate the moving average from that data, including up to N trailing rows and up to Q leading rows:
import pandas
my_data = pandas.DataFrame({
'values': [5.0, 4.0, 3.0, 5.0, 5.0, 6.0, 4.0, 6.0, 7.0, 4.0, 5.0, 5.0]
})
my_data
values
0 5.0
1 4.0
2 3.0
3 5.0
4 5.0
5 6.0
6 4.0
7 6.0
8 7.0
9 4.0
10 5.0
11 5.0
N=2
Q=3
get_moving_average(my_data, lagging=N, leading=Q)
values mavg
0 5 4.250000
1 4 4.400000
2 3 4.666667
3 5 4.500000
4 5 4.833333
5 6 5.500000
6 4 5.333333
7 6 5.333333
8 7 5.166667
9 4 5.400000
10 5 5.250000
11 5 4.666667
Here, the moving average for row 0 is 4.25 - the value of cell 0 plus 0 bolded lagging rows (as there are none) and 3 italicized leading rows (the value of cells 1:3 - so (5.0 + 4.0 + 3.0 + 5.0) / 4.0 = 17.0 / 4.0 = 4.25
The moving average for row 1 includes all of the above, but also the bolded trailing row 0 (as we accept 2 lagging rows, but only one is present), yielding (5.0 + 4.0 + 3.0 + 5.0 + 5.0) / 5.0 = 22.0 / 5.0 = 4.4.
And so on. But, pandas' rolling method only takes a single argument for window, not something that allows for choosing the size on each side, and seems to have issues where insufficient data is present.
Is there an alternative to rolling that covers this, or do I need to roll my own?
Upvotes: 2
Views: 762
Reputation: 153460
I am not sure this will work for all situations without further testing.
Let's get creative. Use rolling
and shift
backwards in the "rolling window" to get the appropriate trailing and leading row count. To handle the NaN at the of the rolling mean, let's roll backwards and fillna
.
N=2
Q=3
T = N+Q+1
my_data.rolling(T, min_periods=1).mean().shift(-Q)\
.fillna(my_data[::-1].rolling(T, min_periods=Q).mean().shift(-N)[::-1])
Output:
values
0 4.250000
1 4.400000
2 4.666667
3 4.500000
4 4.833333
5 5.500000
6 5.333333
7 5.333333
8 5.166667
9 5.400000
10 5.250000
11 4.666667
Steps:
Do a rolling calculation over the full interval with, in your case, would be N (Lagging) + Q (leading) and the current which sums to total (T).
my_data.rolling(T, min_periods=1).mean()
Now, let's shift the calculation using shift
back such that the window is as expected, N lagging and Q leading, so shift -Q.
my_data.rolling(T, min_periods=1).mean().shift(-Q)
Output:
values
0 4.250000
1 4.400000
2 4.666667
3 4.500000
4 4.833333
5 5.500000
6 5.333333
7 5.333333
8 5.166667
9 NaN
10 NaN
11 NaN
Now to handle those NaN values at the end of the dataframe, we do a rolling calculation in reverse and use fillna fill the original. [::-1] is a reversing element to start at the end and rolling to the begin, then use [::-1] to flip again to line up with the original result.
my_data[::-1].rolling(T, min_periods=Q).mean().shift(-N)[::-1]
Outputs:
values
0 NaN
1 NaN
2 4.666667
3 4.500000
4 4.833333
5 5.500000
6 5.333333
7 5.333333
8 5.166667
9 5.400000
10 5.250000
11 4.666667
Use fillna to on the first series with the second series to get:
N=2
Q=3
T = N+Q+1
my_data.rolling(T, min_periods=1).mean().shift(-Q)\
.fillna(my_data[::-1].rolling(T, min_periods=Q).mean().shift(-N)[::-1])
Output:
values
0 4.250000
1 4.400000
2 4.666667
3 4.500000
4 4.833333
5 5.500000
6 5.333333
7 5.333333
8 5.166667
9 5.400000
10 5.250000
11 4.666667
Upvotes: 1