Reputation: 1885
I am trying to sample every nth element to sort an array. My current solution works, but it feels like there should be a solution that does not involve concatenation.
My current implementation is as follows.
arr = np.arange(10)
print(arr)
[0 1 2 3 4 5 6 7 8 9]
# sample every 5th element
res = np.empty(shape=0)
for i in range(5):
res = np.concatenate([res, arr[i::5]])
print(res)
[0. 5. 1. 6. 2. 7. 3. 8. 4. 9.]
Looking for any tips to make this faster/more pythonic. My use case is with an array of ~10,000 values.
Upvotes: 2
Views: 160
Reputation: 14413
Reshape your vector into in a 2D array with N
elements per row, and then flatten it column-wise:
import numpy as np
# Pick "subsample stride"
N = 5
# Create a vector with length divisible by N.
arr = np.arange(2 * N)
print(arr)
# Reshape arr into a 2D array with N elements per row and however many
# columns required.
# Flatten it with "F" ordering for "Fortran style" (column-major).
output = arr.reshape(-1, N).flatten("F")
print(output)
outputs
[0 1 2 3 4 5 6 7 8 9]
[0 5 1 6 2 7 3 8 4 9]
Python 3.10.6 (main, Nov 14 2022, 16:10:14) [GCC 11.3.0]
Type 'copyright', 'credits' or 'license' for more information
IPython 7.31.1 -- An enhanced Interactive Python. Type '?' for help.
In [1]: import numpy as np
In [2]: def sol0(arr):
...: """OP's original solution."""
...: res = np.empty(shape=0)
...: for i in range(5):
...: res = np.concatenate([res, arr[i::5]])
...: return res
...:
In [3]: def sol1(arr):
...: """This answer's solution."""
...: return arr.reshape(-1, 5).flatten("F")
...:
In [4]: def sol2(arr):
...: """@seralouk's solution, with shape error patch"""
...: res = np.empty((5, arr.size//5), order='F')
...: for i in range(5):
...: res[i::5] = arr[i::5]
...: return res.reshape(-1)
In [5]: arr = np.arange(10_000)
In [6]: %timeit sol0(arr)
26.6 µs ± 724 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [7]: %timeit sol1(arr)
7.81 µs ± 34 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [8]: %timeit sol2(arr)
36.3 µs ± 841 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Upvotes: 4