Reputation: 6220
I have a numpy array
a = np.arange(12)
>>> [0,1,2,3,4,5,6,7,8,9,10,11]
I am trying to calculate all possible cumsums like this
np.cumsum[2:] + np.cumsum[:-2]
np.cumsum[3:] + np.cumsum[:-3]
...
np.cumsum[11:] + np.cumsum[:-11]
How can I achieve this without a loop I tried doing
starts = np.arange(2,12)
np.cumsum[starts:] + np.cumsum[:-starts]
but I get this error
TypeError: only integer scalar arrays can be converted to a scalar index
How do I do this without a for loop
What I am trying to do
I am trying to calculate moving average of all possible time frames within the length of a sequence. For example, if I had an array size of 10, I could do moving average 1 period (doesn't make sense) , moving average 2 periods, 3 periods...10 periods. How do I accomplish this. I want to calculate the moving average from 2 to n where n is the size of the sequence
Upvotes: 1
Views: 205
Reputation: 2821
It is not what you asked for. But if you are looking for a simpler solution , you can use the pandas approach.
df = pd.DataFrame({'a' :np.arange(11)}) # your data
window_lengths = np.arange(2,len(a)) # define window lengths from 2 to n
[rolling_win.mean() for rolling_win in [df.rolling(length) for length in window_lengths]]
output :
[ a
0 NaN
1 0.5
2 1.5
3 2.5
4 3.5
5 4.5
6 5.5
7 6.5
8 7.5
9 8.5
10 9.5, a
0 NaN
1 NaN
2 1.0
3 2.0
4 3.0
5 4.0
6 5.0
7 6.0
8 7.0
9 8.0
10 9.0, a
0 NaN
1 NaN
2 NaN
3 1.5
4 2.5
5 3.5
6 4.5
7 5.5
8 6.5
9 7.5
10 8.5, a
0 NaN
1 NaN
2 NaN
3 NaN
4 2.0
5 3.0
6 4.0
7 5.0
8 6.0
9 7.0
10 8.0, a
0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
5 2.5
6 3.5
7 4.5
8 5.5
9 6.5
10 7.5, a
0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
5 NaN
6 3.0
7 4.0
8 5.0
9 6.0
10 7.0, a
0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
5 NaN
6 NaN
7 3.5
8 4.5
9 5.5
10 6.5, a
0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
5 NaN
6 NaN
7 NaN
8 4.0
9 5.0
10 6.0, a
0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
5 NaN
6 NaN
7 NaN
8 NaN
9 4.5
10 5.5]
Upvotes: 1
Reputation: 5294
Not sure I understood the question completely, here's something you could use as a starting point.
You need arrays with uniform sizes to be able to exploit vectorization. You cannot do it with simple slicing but zero padding can help in this case:
In [3]: a = np.arange(12)
In [4]: a
Out[4]: array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])
In [15]: starts = np.arange(2,12)
In [18]: left = np.stack([np.pad(a,(0,s),mode="constant")[s:] for s in starts])
In [19]: left
Out[19]:
array([[ 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 0, 0],
[ 3, 4, 5, 6, 7, 8, 9, 10, 11, 0, 0, 0],
[ 4, 5, 6, 7, 8, 9, 10, 11, 0, 0, 0, 0],
[ 5, 6, 7, 8, 9, 10, 11, 0, 0, 0, 0, 0],
[ 6, 7, 8, 9, 10, 11, 0, 0, 0, 0, 0, 0],
[ 7, 8, 9, 10, 11, 0, 0, 0, 0, 0, 0, 0],
[ 8, 9, 10, 11, 0, 0, 0, 0, 0, 0, 0, 0],
[ 9, 10, 11, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[10, 11, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[11, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])
Here you need to also shift everything to the left to get proper alignment:
In [27]: right = np.stack([ np.roll(np.pad(a, (s,0), mode="constant")[:-s], -s) for s in starts ])
In [28]: right
Out[28]:
array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 0],
[0, 1, 2, 3, 4, 5, 6, 7, 8, 0, 0, 0],
[0, 1, 2, 3, 4, 5, 6, 7, 0, 0, 0, 0],
[0, 1, 2, 3, 4, 5, 6, 0, 0, 0, 0, 0],
[0, 1, 2, 3, 4, 5, 0, 0, 0, 0, 0, 0],
[0, 1, 2, 3, 4, 0, 0, 0, 0, 0, 0, 0],
[0, 1, 2, 3, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 1, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])
Now you can use vectorized np.cumsum
for the intensive part
In [41]: np.cumsum(left, axis=1) + np.cumsum(right, axis=1)
Out[41]:
array([[ 2, 6, 12, 20, 30, 42, 56, 72, 90, 110, 110, 110],
[ 3, 8, 15, 24, 35, 48, 63, 80, 99, 99, 99, 99],
[ 4, 10, 18, 28, 40, 54, 70, 88, 88, 88, 88, 88],
[ 5, 12, 21, 32, 45, 60, 77, 77, 77, 77, 77, 77],
[ 6, 14, 24, 36, 50, 66, 66, 66, 66, 66, 66, 66],
[ 7, 16, 27, 40, 55, 55, 55, 55, 55, 55, 55, 55],
[ 8, 18, 30, 44, 44, 44, 44, 44, 44, 44, 44, 44],
[ 9, 20, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33],
[ 10, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22],
[ 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11]])
Now you probably need to clean up the result to get what you want, but I'm still not sure, it would be great if you could post the expected output. Something like this should do:
In [50]: [ row[:-s] for row,s in zip(csum,starts) ]
Out[50]:
[array([ 2, 6, 12, 20, 30, 42, 56, 72, 90, 110]),
array([ 3, 8, 15, 24, 35, 48, 63, 80, 99]),
array([ 4, 10, 18, 28, 40, 54, 70, 88]),
array([ 5, 12, 21, 32, 45, 60, 77]),
array([ 6, 14, 24, 36, 50, 66]),
array([ 7, 16, 27, 40, 55]),
array([ 8, 18, 30, 44]),
array([ 9, 20, 33]),
array([10, 22]),
array([11])]
Upvotes: 1