Reputation: 33
I have a time series with 4 features at each step, it looks like a set of rows with 4 columns. I want convert it, so row N will contain a vector of features of rows N and N-1
a = np.array([[1,2,3,0], [4,5,6,0], [7,8,9,0], [10,11,12,0]])
array([[ 1, 2, 3, 0],
[ 4, 5, 6, 0],
[ 7, 8, 9, 0],
[10, 11, 12, 0]])
a.shape
(4, 4)
convert to:
array([[[ 1, 2, 3, 0],
[ 4, 5, 6, 0]],
[[ 4, 5, 6, 0],
[ 7, 8, 9, 0]],
[[ 7, 8, 9, 0],
[10, 11, 12, 0]]])
a_.shape
(3, 2, 4)
I'm using the following code to do that:
seq_len = 2
for i in range(seq_len, a.shape[0]+1):
if i-seq_len == 0:
a_ = a[i-seq_len:i, :].reshape(1, -1, 4)
else:
a_ = np.vstack([a_, a[i-seq_len:i, :].reshape(1, -1, 4)])
It's working but I think it is not an optimal solution. Could you please suggest how I can improve my code by avoiding 'for' cycle?
Upvotes: 3
Views: 482
Reputation: 1553
Use adequate slicing and np.stack along the adequate axis.
np.stack((a[:-1], a[1:]), axis=1)
Some timings to compare with the other answer out there.
In [13]: s = 1_000_000
In [15]: a = np.arange(s).reshape((s//4,4))
In [21]: %timeit a[[(i-1,i) for i in range(1,a.shape[0])],:]
127 ms ± 724 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [22]: %timeit np.stack((a[:-1], a[1:]), axis=1) # My solution
6.8 ms ± 8.18 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Avoiding any python-level for
-loop is the way to go, OP was right.
Upvotes: 8
Reputation: 699
Use slicing: a[[(i-1,i) for i in range(1,a.shape[0])],:]
Edit: nicoco's answer is the better one.
Upvotes: 0