pythonnumpyvectorizationcombinationscomputational-geometry

Reputation: 885

find all possible combinations of elements of an array. NOT PRODUCT

Ive looked into this question and people keep advising to use np.meshgrid() to find all possible combinations of an array. but the problem is np.meshgrid() does not produce combinations it produces products (similar to itertools.product())

in a combination, elements are non-repeating and unordered

arr = np.arange(5)
r = 3

These are what combinations look like

np.array(
    list(itertools.combinations(arr, r))
)

>>>   [[0, 1, 2],
       [0, 1, 3],
       [0, 1, 4],
       [0, 2, 3],
       [0, 2, 4],
       [0, 3, 4],
       [1, 2, 3],
       [1, 2, 4],
       [1, 3, 4],
       [2, 3, 4]]

following are not combinations

np.array(
    list(itertools.product(arr, arr, arr))
)

>>>   [[0, 0, 0],
       [0, 0, 1],
       [0, 0, 2],
       [0, 0, 3],
       [0, 0, 4],
       [0, 1, 0],
       [0, 1, 1],
       [0, 1, 2],
       ....,
       [4, 3, 2],
       [4, 3, 3],
       [4, 3, 4],
       [4, 4, 0],
       [4, 4, 1],
       [4, 4, 2],
       [4, 4, 3],
       [4, 4, 4]])

np.array(
    np.meshgrid(arr, arr, arr)
).transpose([2, 1, 3, 0]).reshape(-1, r)

>>>   [[0, 0, 0],
       [0, 0, 1],
       [0, 0, 2],
       [0, 0, 3],
       [0, 0, 4],
       [0, 1, 0],
       [0, 1, 1],
       [0, 1, 2],
       ....,
       [4, 3, 2],
       [4, 3, 3],
       [4, 3, 4],
       [4, 4, 0],
       [4, 4, 1],
       [4, 4, 2],
       [4, 4, 3],
       [4, 4, 4]])

for r = 2 i've found a neat way to find combinations

np.array(
    np.triu_indices(len(arr), 1)
).T

>>>   [[0, 1],
       [0, 2],
       [0, 3],
       [0, 4],
       [1, 2],
       [1, 3],
       [1, 4],
       [2, 3],
       [2, 4],
       [3, 4]]

but im having a hard time finding any vectorized methods for r > 2

NOTE: even if my array is not [0, 1, 2, 3, 4] i could use the above answers as indices.

if it helps to imagine,

for r = 2 the required answer is indices of top-right triangle of a square matrix of size len(arr), ignoring the diagonal.

for r = 3 the required answer is indices of top-right-upper tetrahedron (middle one in the image) of a 3d array of size (you guessed it) len(arr), ignoring the 3d equivalent of diagonal.

Upvotes: 3

Answers (3)

Hammad Ahmed

Reputation: 885

just running some bench marks for the answers

import numpy as np
import itertools
import perfplot

itertools implementation

def func0(n, r):
    
    r = np.array(
        list(itertools.combinations(range(n), r))
    )
    
    return r

@hpaulj 's implementation

def func1(n, r):
    
    prod = np.argwhere(np.ones(r*[n], dtype= bool))
    mask = True
    
    for i in range(r-1): mask = mask & (prod[:, i] < prod[:, i+1])
        
    return prod[mask]

@Ehsan 's implementation

def func2(n, r):
    
    rngs = (np.arange(n), )*r
    indx = np.ix_(*rngs)
    mask = True
    
    for i in range(r-1): mask = mask & (indx[i] < indx[i+1])
        
    return np.argwhere(mask)

performance w.r.t n

bench_n = perfplot.bench(
    n_range= range(4, 50),
    setup  = lambda n: n,
    kernels= [
        lambda n: func0(n, 4),
        lambda n: func1(n, 4),
        lambda n: func2(n, 4)
    ],
    labels = ['func0', 'func1', 'func2']
)

bench_n.show()

performance w.r.t r

bench_r = perfplot.bench(
    n_range= range(2, 8),
    setup  = lambda r: r,
    kernels= [
        lambda r: func0(10, r),
        lambda r: func1(10, r),
        lambda r: func2(10, r)
    ],
    labels = ['func0', 'func1', 'func2']
)

bench_r.show()

Upvotes: 1

hpaulj

Reputation: 231615

Look at the code:

def triu_indices(n, k=0, m=None):
    return nonzero(~tri(n, m, k=k-1, dtype=bool))

def tri(N, M=None, k=0, dtype=float):
    m = greater_equal.outer(arange(N, dtype=_min_int(0, N)),
                        arange(-k, M-k, dtype=_min_int(-k, M - k)))

So for a square:

In [75]: np.greater_equal.outer(np.arange(5),np.arange(5))                                           
Out[75]: 
array([[ True, False, False, False, False],
       [ True,  True, False, False, False],
       [ True,  True,  True, False, False],
       [ True,  True,  True,  True, False],
       [ True,  True,  True,  True,  True]])
In [76]: np.argwhere(~np.greater_equal.outer(np.arange(5),np.arange(5)))                             
Out[76]: 
array([[0, 1],
       [0, 2],
       [0, 3],
       [0, 4],
       [1, 2],
       [1, 3],
       [1, 4],
       [2, 3],
       [2, 4],
       [3, 4]])

Or using ix_ to create broadcastable ranges:

In [77]: I,J = np.ix_(np.arange(5),np.arange(5))                                                     
In [78]: I,J                                                                                         
Out[78]: 
(array([[0],
        [1],
        [2],
        [3],
        [4]]),
 array([[0, 1, 2, 3, 4]]))
In [79]: I>=J                                                                                        
Out[79]: 
array([[ True, False, False, False, False],
       [ True,  True, False, False, False],
       [ True,  True,  True, False, False],
       [ True,  True,  True,  True, False],
       [ True,  True,  True,  True,  True]])
In [80]: I<J                                                                                         
Out[80]: 
array([[False,  True,  True,  True,  True],
       [False, False,  True,  True,  True],
       [False, False, False,  True,  True],
       [False, False, False, False,  True],
       [False, False, False, False, False]])

Generalizing this to 3d:

In [90]: I,J,K = np.ix_(np.arange(5),np.arange(5),np.arange(5))                                      
In [91]: np.argwhere((I<J) & (J<K))                                                                  
Out[91]: 
array([[0, 1, 2],
       [0, 1, 3],
       [0, 1, 4],
       [0, 2, 3],
       [0, 2, 4],
       [0, 3, 4],
       [1, 2, 3],
       [1, 2, 4],
       [1, 3, 4],
       [2, 3, 4]])

This makes a (5,5,5) array, and then finds the True. So it's not surprising that its timeing isn't as good as itertools:

In [92]: %%timeit 
    ...: I,J,K = np.ix_(np.arange(5),np.arange(5),np.arange(5))   
    ...: np.argwhere((I<J) & (J<K))                                                                                             
60.4 µs ± 97 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [99]: timeit np.array(list(itertools.combinations(np.arange(5), 3)))                              
20.9 µs ± 1.74 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Upvotes: 1

Ehsan

Reputation: 12417

This is similar to your idea for 3-D:

n = 5
r = 3
a = np.argwhere(np.triu(np.ones((n,)*r),1))
a[a[:,0]<a[:,1]]

output:

[[0 1 2]
 [0 1 3]
 [0 1 4]
 [0 2 3]
 [0 2 4]
 [0 3 4]
 [1 2 3]
 [1 2 4]
 [1 3 4]
 [2 3 4]]

for 4-D (and so on) you can expand like this (Not sure of the performance):

n = 5
r = 4
a = np.argwhere(np.triu(np.ones((n,)*r),1))
a[(a[:,0]<a[:,1]) & (a[:,1]<a[:,2])]

output:

[[0 1 2 3]
 [0 1 2 4]
 [0 1 3 4]
 [0 2 3 4]
 [1 2 3 4]]

Itertools seems to be faster if that is what you are aiming at:

def m1(n):
  r = 3
  a = np.argwhere(np.triu(np.ones((n,)*r),1))
  return a[a[:,0]<a[:,1]]

def m2(n):
  r = 3
  return combinations(np.arange(n),r)

in_ = [5,10,100,200]

Upvotes: 2

find all possible combinations of elements of an array. NOT PRODUCT

Answers (3)

Related Questions