Reputation: 2801
Suppose I have a numpy array that consists of pairs of values. I'd like to find all combinations of the pairs without tearing them apart. Particularly, I was hoping for a numpy.meshgrid
solution for this.
Imagine an array constructed like:
ab = np.array([[1,10], [2,20], [3,30], [4,40]])
Then my desired output is
>>> out: ([1,10], [2,20])
([1,10], [3,30])
([1,10], [4,40])
([2,20], [3,30])
([2,20], [4,40])
([3,30], [4,40])
The output can be either a np.array
, or a tuple
(I can convert accordingly afterwards). Please notice how duplicates are omitted in my results, neglecting the order of my couples (if [[1,10], [2,20]]
is already there, I don't want [[2,20], [1,10]]
in my output). For the real case, ab
is of size 30,000, so speed is another issue.
That's why I tried meshgrid in the first place. For the simple case of single values, this is easily done (yet, still with the duplicates):
a = np.array([1,2,3,4])
mesh = np.array(np.meshgrid(a,a)).T.reshape(-1,2)
>>> out: [[1 1]
[1 2]
[1 3]
[1 4]
[2 1]
[...]
[4 4]]
but for my pairs, my attempt of
mesh = np.array(np.meshgrid(ab,ab)).T
gives me
[[[ 1 1]
[ 1 10]
[ 1 2]
[ 1 20]
[ 1 3]
[ 1 30]
[ 1 4]
[ 1 40]]
[[10 1]
[10 10]
[10 2]
[10 20]
...
[40 3]
[40 30]
[40 4]
[40 40]]]
In other words: meshgrid breaks up my pairs. I assume the solution is near, but I couldn't come up with it on my own. Any help is appreciated, thanks!
Upvotes: 3
Views: 2750
Reputation: 221614
Don't think meshgrid
would work as it creates all possible combinations (not without filtering out later on). To solve it, two approaches could be proposed.
Approach #1
We can get the row indices of those pairwise combinations without duplicates and then simply index into rows to get the desired output, like so -
In [99]: r,c = np.triu_indices(len(ab),1)
In [100]: np.hstack(( ab[r], ab[c] ))
Out[100]:
array([[ 1, 10, 2, 20],
[ 1, 10, 3, 30],
[ 1, 10, 4, 40],
[ 2, 20, 3, 30],
[ 2, 20, 4, 40],
[ 3, 30, 4, 40]])
To get the desired output as a 3D
array, stack along the second axis -
In [115]: np.stack(( ab[r], ab[c] ), axis=1)
Out[115]:
array([[[ 1, 10],
[ 2, 20]],
[[ 1, 10],
[ 3, 30]],
[[ 1, 10],
[ 4, 40]],
[[ 2, 20],
[ 3, 30]],
[[ 2, 20],
[ 4, 40]],
[[ 3, 30],
[ 4, 40]]])
As function :
def pairwise_combs1(ab):
r,c = np.triu_indices(len(ab),1)
return np.stack(( ab[r], ab[c] ), axis=1)
Approach #2 Another with slicing
and array-initialization
targeting memory efficiency and hence performance -
def pairwise_combs2(ab):
n = len(ab)
N = n*(n-1)//2
out = np.empty((N,2,2),dtype=ab.dtype)
idx = np.concatenate(( [0], np.arange(n-1,0,-1).cumsum() ))
start, stop = idx[:-1], idx[1:]
for j,i in enumerate(range(n-1)):
out[start[j]:stop[j],0] = ab[j]
out[start[j]:stop[j],1] = ab[j+1:]
return out
Runtime test
In [166]: ab = np.random.randint(0,9,(1000,2))
In [167]: %timeit pairwise_combs1(ab)
10 loops, best of 3: 20 ms per loop
In [168]: %timeit pairwise_combs2(ab)
100 loops, best of 3: 6.25 ms per loop
In [169]: np.allclose(pairwise_combs1(ab), pairwise_combs2(ab))
Out[169]: True
Upvotes: 7