PirateApp
PirateApp

Reputation: 6220

How to get all possible slices of a 1D numpy array depending on input

I have a numpy array

a = np.arange(12)
>>> [0,1,2,3,4,5,6,7,8,9,10,11]

I am trying to calculate all possible cumsums like this

np.cumsum[2:] + np.cumsum[:-2]
np.cumsum[3:] + np.cumsum[:-3]
...
np.cumsum[11:] + np.cumsum[:-11]

How can I achieve this without a loop I tried doing

starts = np.arange(2,12)
np.cumsum[starts:] + np.cumsum[:-starts]
but I get this error
TypeError: only integer scalar arrays can be converted to a scalar index

How do I do this without a for loop

What I am trying to do

I am trying to calculate moving average of all possible time frames within the length of a sequence. For example, if I had an array size of 10, I could do moving average 1 period (doesn't make sense) , moving average 2 periods, 3 periods...10 periods. How do I accomplish this. I want to calculate the moving average from 2 to n where n is the size of the sequence

Upvotes: 1

Views: 205

Answers (2)

Siva-Sg
Siva-Sg

Reputation: 2821

It is not what you asked for. But if you are looking for a simpler solution , you can use the pandas approach.

df = pd.DataFrame({'a' :np.arange(11)})  # your data 
window_lengths = np.arange(2,len(a))  # define window lengths from 2 to n
[rolling_win.mean() for rolling_win in [df.rolling(length) for length in window_lengths]]

output :

 [      a
     0   NaN
     1   0.5
     2   1.5
     3   2.5
     4   3.5
     5   4.5
     6   5.5
     7   6.5
     8   7.5
     9   8.5
     10  9.5,       a
     0   NaN
     1   NaN
     2   1.0
     3   2.0
     4   3.0
     5   4.0
     6   5.0
     7   6.0
     8   7.0
     9   8.0
     10  9.0,       a
     0   NaN
     1   NaN
     2   NaN
     3   1.5
     4   2.5
     5   3.5
     6   4.5
     7   5.5
     8   6.5
     9   7.5
     10  8.5,       a
     0   NaN
     1   NaN
     2   NaN
     3   NaN
     4   2.0
     5   3.0
     6   4.0
     7   5.0
     8   6.0
     9   7.0
     10  8.0,       a
     0   NaN
     1   NaN
     2   NaN
     3   NaN
     4   NaN
     5   2.5
     6   3.5
     7   4.5
     8   5.5
     9   6.5
     10  7.5,       a
     0   NaN
     1   NaN
     2   NaN
     3   NaN
     4   NaN
     5   NaN
     6   3.0
     7   4.0
     8   5.0
     9   6.0
     10  7.0,       a
     0   NaN
     1   NaN
     2   NaN
     3   NaN
     4   NaN
     5   NaN
     6   NaN
     7   3.5
     8   4.5
     9   5.5
     10  6.5,       a
     0   NaN
     1   NaN
     2   NaN
     3   NaN
     4   NaN
     5   NaN
     6   NaN
     7   NaN
     8   4.0
     9   5.0
     10  6.0,       a
     0   NaN
     1   NaN
     2   NaN
     3   NaN
     4   NaN
     5   NaN
     6   NaN
     7   NaN
     8   NaN
     9   4.5
     10  5.5]

Upvotes: 1

filippo
filippo

Reputation: 5294

Not sure I understood the question completely, here's something you could use as a starting point.

You need arrays with uniform sizes to be able to exploit vectorization. You cannot do it with simple slicing but zero padding can help in this case:

In [3]: a = np.arange(12)

In [4]: a
Out[4]: array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])

In [15]: starts = np.arange(2,12)

In [18]: left = np.stack([np.pad(a,(0,s),mode="constant")[s:] for s in starts])

In [19]: left
Out[19]: 
array([[ 2,  3,  4,  5,  6,  7,  8,  9, 10, 11,  0,  0],
       [ 3,  4,  5,  6,  7,  8,  9, 10, 11,  0,  0,  0],
       [ 4,  5,  6,  7,  8,  9, 10, 11,  0,  0,  0,  0],
       [ 5,  6,  7,  8,  9, 10, 11,  0,  0,  0,  0,  0],
       [ 6,  7,  8,  9, 10, 11,  0,  0,  0,  0,  0,  0],
       [ 7,  8,  9, 10, 11,  0,  0,  0,  0,  0,  0,  0],
       [ 8,  9, 10, 11,  0,  0,  0,  0,  0,  0,  0,  0],
       [ 9, 10, 11,  0,  0,  0,  0,  0,  0,  0,  0,  0],
       [10, 11,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0],
       [11,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0]])

Here you need to also shift everything to the left to get proper alignment:

In [27]: right = np.stack([ np.roll(np.pad(a, (s,0), mode="constant")[:-s], -s) for s in starts ])

In [28]: right
Out[28]: 
array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 0],
       [0, 1, 2, 3, 4, 5, 6, 7, 8, 0, 0, 0],
       [0, 1, 2, 3, 4, 5, 6, 7, 0, 0, 0, 0],
       [0, 1, 2, 3, 4, 5, 6, 0, 0, 0, 0, 0],
       [0, 1, 2, 3, 4, 5, 0, 0, 0, 0, 0, 0],
       [0, 1, 2, 3, 4, 0, 0, 0, 0, 0, 0, 0],
       [0, 1, 2, 3, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 1, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])

Now you can use vectorized np.cumsum for the intensive part

In [41]: np.cumsum(left, axis=1) + np.cumsum(right, axis=1)
Out[41]:
array([[  2,   6,  12,  20,  30,  42,  56,  72,  90, 110, 110, 110],
       [  3,   8,  15,  24,  35,  48,  63,  80,  99,  99,  99,  99],
       [  4,  10,  18,  28,  40,  54,  70,  88,  88,  88,  88,  88],
       [  5,  12,  21,  32,  45,  60,  77,  77,  77,  77,  77,  77],
       [  6,  14,  24,  36,  50,  66,  66,  66,  66,  66,  66,  66],
       [  7,  16,  27,  40,  55,  55,  55,  55,  55,  55,  55,  55],
       [  8,  18,  30,  44,  44,  44,  44,  44,  44,  44,  44,  44],
       [  9,  20,  33,  33,  33,  33,  33,  33,  33,  33,  33,  33],
       [ 10,  22,  22,  22,  22,  22,  22,  22,  22,  22,  22,  22],
       [ 11,  11,  11,  11,  11,  11,  11,  11,  11,  11,  11,  11]])

Now you probably need to clean up the result to get what you want, but I'm still not sure, it would be great if you could post the expected output. Something like this should do:

In [50]: [ row[:-s] for row,s in zip(csum,starts) ]
Out[50]: 
[array([  2,   6,  12,  20,  30,  42,  56,  72,  90, 110]),
 array([ 3,  8, 15, 24, 35, 48, 63, 80, 99]),
 array([ 4, 10, 18, 28, 40, 54, 70, 88]),
 array([ 5, 12, 21, 32, 45, 60, 77]),
 array([ 6, 14, 24, 36, 50, 66]),
 array([ 7, 16, 27, 40, 55]),
 array([ 8, 18, 30, 44]),
 array([ 9, 20, 33]),
 array([10, 22]),
 array([11])]

Upvotes: 1

Related Questions