offeltoffel
offeltoffel

Reputation: 2801

Finding all combinations of paired numpy arrays via meshgrid

Suppose I have a numpy array that consists of pairs of values. I'd like to find all combinations of the pairs without tearing them apart. Particularly, I was hoping for a numpy.meshgrid solution for this.

Imagine an array constructed like:

ab = np.array([[1,10], [2,20], [3,30], [4,40]])

Then my desired output is

>>> out: ([1,10], [2,20])
         ([1,10], [3,30])
         ([1,10], [4,40])
         ([2,20], [3,30])
         ([2,20], [4,40])
         ([3,30], [4,40])

The output can be either a np.array, or a tuple (I can convert accordingly afterwards). Please notice how duplicates are omitted in my results, neglecting the order of my couples (if [[1,10], [2,20]] is already there, I don't want [[2,20], [1,10]] in my output). For the real case, ab is of size 30,000, so speed is another issue.

That's why I tried meshgrid in the first place. For the simple case of single values, this is easily done (yet, still with the duplicates):

a = np.array([1,2,3,4])
mesh = np.array(np.meshgrid(a,a)).T.reshape(-1,2)
>>> out: [[1 1]
          [1 2]
          [1 3]
          [1 4]
          [2 1]
          [...]
          [4 4]]

but for my pairs, my attempt of

mesh = np.array(np.meshgrid(ab,ab)).T

gives me

[[[ 1  1]
  [ 1 10]
  [ 1  2]
  [ 1 20]
  [ 1  3]
  [ 1 30]
  [ 1  4]
  [ 1 40]]

 [[10  1]
  [10 10]
  [10  2]
  [10 20]
...    
  [40  3]
  [40 30]
  [40  4]
  [40 40]]]

In other words: meshgrid breaks up my pairs. I assume the solution is near, but I couldn't come up with it on my own. Any help is appreciated, thanks!

Upvotes: 3

Views: 2750

Answers (1)

Divakar
Divakar

Reputation: 221614

Don't think meshgrid would work as it creates all possible combinations (not without filtering out later on). To solve it, two approaches could be proposed.

Approach #1

We can get the row indices of those pairwise combinations without duplicates and then simply index into rows to get the desired output, like so -

In [99]: r,c = np.triu_indices(len(ab),1)

In [100]: np.hstack(( ab[r], ab[c] ))
Out[100]: 
array([[ 1, 10,  2, 20],
       [ 1, 10,  3, 30],
       [ 1, 10,  4, 40],
       [ 2, 20,  3, 30],
       [ 2, 20,  4, 40],
       [ 3, 30,  4, 40]])

To get the desired output as a 3D array, stack along the second axis -

In [115]: np.stack(( ab[r], ab[c] ), axis=1)
Out[115]: 
array([[[ 1, 10],
        [ 2, 20]],

       [[ 1, 10],
        [ 3, 30]],

       [[ 1, 10],
        [ 4, 40]],

       [[ 2, 20],
        [ 3, 30]],

       [[ 2, 20],
        [ 4, 40]],

       [[ 3, 30],
        [ 4, 40]]])

As function :

def pairwise_combs1(ab):
    r,c = np.triu_indices(len(ab),1)
    return np.stack(( ab[r], ab[c] ), axis=1)

Approach #2 Another with slicing and array-initialization targeting memory efficiency and hence performance -

def pairwise_combs2(ab):
    n = len(ab)
    N = n*(n-1)//2
    out = np.empty((N,2,2),dtype=ab.dtype)
    idx = np.concatenate(( [0], np.arange(n-1,0,-1).cumsum() ))
    start, stop = idx[:-1], idx[1:]
    for j,i in enumerate(range(n-1)):
        out[start[j]:stop[j],0] = ab[j]
        out[start[j]:stop[j],1] = ab[j+1:]
    return out

Runtime test

In [166]: ab = np.random.randint(0,9,(1000,2))

In [167]: %timeit pairwise_combs1(ab)
10 loops, best of 3: 20 ms per loop

In [168]: %timeit pairwise_combs2(ab)
100 loops, best of 3: 6.25 ms per loop

In [169]: np.allclose(pairwise_combs1(ab), pairwise_combs2(ab))
Out[169]: True

Upvotes: 7

Related Questions