Reputation: 885
Ive looked into this question and people keep advising to use np.meshgrid()
to find all possible combinations of an array. but the problem is np.meshgrid()
does not produce combinations it produces products (similar to itertools.product())
in a combination, elements are non-repeating and unordered
arr = np.arange(5)
r = 3
These are what combinations look like
np.array(
list(itertools.combinations(arr, r))
)
>>> [[0, 1, 2],
[0, 1, 3],
[0, 1, 4],
[0, 2, 3],
[0, 2, 4],
[0, 3, 4],
[1, 2, 3],
[1, 2, 4],
[1, 3, 4],
[2, 3, 4]]
following are not combinations
np.array(
list(itertools.product(arr, arr, arr))
)
>>> [[0, 0, 0],
[0, 0, 1],
[0, 0, 2],
[0, 0, 3],
[0, 0, 4],
[0, 1, 0],
[0, 1, 1],
[0, 1, 2],
....,
[4, 3, 2],
[4, 3, 3],
[4, 3, 4],
[4, 4, 0],
[4, 4, 1],
[4, 4, 2],
[4, 4, 3],
[4, 4, 4]])
np.array(
np.meshgrid(arr, arr, arr)
).transpose([2, 1, 3, 0]).reshape(-1, r)
>>> [[0, 0, 0],
[0, 0, 1],
[0, 0, 2],
[0, 0, 3],
[0, 0, 4],
[0, 1, 0],
[0, 1, 1],
[0, 1, 2],
....,
[4, 3, 2],
[4, 3, 3],
[4, 3, 4],
[4, 4, 0],
[4, 4, 1],
[4, 4, 2],
[4, 4, 3],
[4, 4, 4]])
for r = 2
i've found a neat way to find combinations
np.array(
np.triu_indices(len(arr), 1)
).T
>>> [[0, 1],
[0, 2],
[0, 3],
[0, 4],
[1, 2],
[1, 3],
[1, 4],
[2, 3],
[2, 4],
[3, 4]]
but im having a hard time finding any vectorized methods for r > 2
NOTE: even if my array is not
[0, 1, 2, 3, 4]
i could use the above answers as indices.
if it helps to imagine,
for r = 2
the required answer is indices of top-right triangle of a square matrix of size len(arr)
, ignoring the diagonal.
for r = 3
the required answer is indices of top-right-upper tetrahedron (middle one in the image) of a 3d array of size (you guessed it) len(arr)
, ignoring the 3d equivalent of diagonal.
Upvotes: 3
Views: 734
Reputation: 885
just running some bench marks for the answers
import numpy as np
import itertools
import perfplot
itertools
implementation
def func0(n, r):
r = np.array(
list(itertools.combinations(range(n), r))
)
return r
@hpaulj 's implementation
def func1(n, r):
prod = np.argwhere(np.ones(r*[n], dtype= bool))
mask = True
for i in range(r-1): mask = mask & (prod[:, i] < prod[:, i+1])
return prod[mask]
@Ehsan 's implementation
def func2(n, r):
rngs = (np.arange(n), )*r
indx = np.ix_(*rngs)
mask = True
for i in range(r-1): mask = mask & (indx[i] < indx[i+1])
return np.argwhere(mask)
performance w.r.t n
bench_n = perfplot.bench(
n_range= range(4, 50),
setup = lambda n: n,
kernels= [
lambda n: func0(n, 4),
lambda n: func1(n, 4),
lambda n: func2(n, 4)
],
labels = ['func0', 'func1', 'func2']
)
bench_n.show()
performance w.r.t r
bench_r = perfplot.bench(
n_range= range(2, 8),
setup = lambda r: r,
kernels= [
lambda r: func0(10, r),
lambda r: func1(10, r),
lambda r: func2(10, r)
],
labels = ['func0', 'func1', 'func2']
)
bench_r.show()
Upvotes: 1
Reputation: 231615
Look at the code:
def triu_indices(n, k=0, m=None):
return nonzero(~tri(n, m, k=k-1, dtype=bool))
def tri(N, M=None, k=0, dtype=float):
m = greater_equal.outer(arange(N, dtype=_min_int(0, N)),
arange(-k, M-k, dtype=_min_int(-k, M - k)))
So for a square:
In [75]: np.greater_equal.outer(np.arange(5),np.arange(5))
Out[75]:
array([[ True, False, False, False, False],
[ True, True, False, False, False],
[ True, True, True, False, False],
[ True, True, True, True, False],
[ True, True, True, True, True]])
In [76]: np.argwhere(~np.greater_equal.outer(np.arange(5),np.arange(5)))
Out[76]:
array([[0, 1],
[0, 2],
[0, 3],
[0, 4],
[1, 2],
[1, 3],
[1, 4],
[2, 3],
[2, 4],
[3, 4]])
Or using ix_
to create broadcastable ranges:
In [77]: I,J = np.ix_(np.arange(5),np.arange(5))
In [78]: I,J
Out[78]:
(array([[0],
[1],
[2],
[3],
[4]]),
array([[0, 1, 2, 3, 4]]))
In [79]: I>=J
Out[79]:
array([[ True, False, False, False, False],
[ True, True, False, False, False],
[ True, True, True, False, False],
[ True, True, True, True, False],
[ True, True, True, True, True]])
In [80]: I<J
Out[80]:
array([[False, True, True, True, True],
[False, False, True, True, True],
[False, False, False, True, True],
[False, False, False, False, True],
[False, False, False, False, False]])
Generalizing this to 3d:
In [90]: I,J,K = np.ix_(np.arange(5),np.arange(5),np.arange(5))
In [91]: np.argwhere((I<J) & (J<K))
Out[91]:
array([[0, 1, 2],
[0, 1, 3],
[0, 1, 4],
[0, 2, 3],
[0, 2, 4],
[0, 3, 4],
[1, 2, 3],
[1, 2, 4],
[1, 3, 4],
[2, 3, 4]])
This makes a (5,5,5) array, and then finds the True. So it's not surprising that its timeing isn't as good as itertools
:
In [92]: %%timeit
...: I,J,K = np.ix_(np.arange(5),np.arange(5),np.arange(5))
...: np.argwhere((I<J) & (J<K))
60.4 µs ± 97 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [99]: timeit np.array(list(itertools.combinations(np.arange(5), 3)))
20.9 µs ± 1.74 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Upvotes: 1
Reputation: 12417
This is similar to your idea for 3-D:
n = 5
r = 3
a = np.argwhere(np.triu(np.ones((n,)*r),1))
a[a[:,0]<a[:,1]]
output:
[[0 1 2]
[0 1 3]
[0 1 4]
[0 2 3]
[0 2 4]
[0 3 4]
[1 2 3]
[1 2 4]
[1 3 4]
[2 3 4]]
for 4-D (and so on) you can expand like this (Not sure of the performance):
n = 5
r = 4
a = np.argwhere(np.triu(np.ones((n,)*r),1))
a[(a[:,0]<a[:,1]) & (a[:,1]<a[:,2])]
output:
[[0 1 2 3]
[0 1 2 4]
[0 1 3 4]
[0 2 3 4]
[1 2 3 4]]
Itertools seems to be faster if that is what you are aiming at:
def m1(n):
r = 3
a = np.argwhere(np.triu(np.ones((n,)*r),1))
return a[a[:,0]<a[:,1]]
def m2(n):
r = 3
return combinations(np.arange(n),r)
in_ = [5,10,100,200]
Upvotes: 2