Ivan Sinyansky
Ivan Sinyansky

Reputation: 33

Python numpy, reshape/transform array avoiding iteration through rows

I have a time series with 4 features at each step, it looks like a set of rows with 4 columns. I want convert it, so row N will contain a vector of features of rows N and N-1

a = np.array([[1,2,3,0], [4,5,6,0], [7,8,9,0], [10,11,12,0]])
array([[ 1,  2,  3,  0],
       [ 4,  5,  6,  0],
       [ 7,  8,  9,  0],
       [10, 11, 12,  0]])

a.shape
(4, 4)

convert to:

array([[[ 1,  2,  3,  0],
        [ 4,  5,  6,  0]],

       [[ 4,  5,  6,  0],
        [ 7,  8,  9,  0]],

       [[ 7,  8,  9,  0],
        [10, 11, 12,  0]]])
a_.shape
(3, 2, 4)

I'm using the following code to do that:

seq_len = 2
for i in range(seq_len, a.shape[0]+1):
    if i-seq_len == 0:
        a_ = a[i-seq_len:i, :].reshape(1, -1, 4)
    else:
        a_ = np.vstack([a_, a[i-seq_len:i, :].reshape(1, -1, 4)])

It's working but I think it is not an optimal solution. Could you please suggest how I can improve my code by avoiding 'for' cycle?

Upvotes: 3

Views: 482

Answers (2)

nicoco
nicoco

Reputation: 1553

Use adequate slicing and np.stack along the adequate axis.

np.stack((a[:-1], a[1:]), axis=1)

Some timings to compare with the other answer out there.

In [13]: s = 1_000_000

In [15]: a = np.arange(s).reshape((s//4,4))

In [21]: %timeit a[[(i-1,i) for i in range(1,a.shape[0])],:]
127 ms ± 724 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [22]: %timeit np.stack((a[:-1], a[1:]), axis=1)  # My solution
6.8 ms ± 8.18 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Avoiding any python-level for-loop is the way to go, OP was right.

Upvotes: 8

Hennich
Hennich

Reputation: 699

Use slicing: a[[(i-1,i) for i in range(1,a.shape[0])],:]

Edit: nicoco's answer is the better one.

Upvotes: 0

Related Questions