Hyuntae
Hyuntae

Reputation: 13

python pairwise subtract two elements in multi-dimensional array consisted of vectors

I wonder if there are very simple ways to calculate pairwise subtraction for two elements in a multi-dimensional array which is consisted of vectors USING a function in NUMPY or SCIPY library.

Let me introduce an example:

>>> a = (np.arange(9)).reshape((3,3)) # a list of 3 vectors (x, y, z)

>>> a
array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

I want to get the following:

>>>> result
array([[3,3,3],
       [6,6,6],
       [3,3,3]])
# first element comes from [3,4,5] - [0,1,2]
# second element comes from [6,7,8] - [0,1,2]
# third element comes from [6,7,8] - [3,4,5]

I do not matter about the signs (+/-) on the result which depends on the order of subtraction of two vectors. But, I want to know VERY SIMPLE version of code using pre-defined functions in Scipy or Numpy libraries such as scipy.spatial.distance.pdist.

I do need loop codes to iterate elements-wise for result, instead, I need just one line to get the result.

Upvotes: 1

Views: 1649

Answers (2)

Divakar
Divakar

Reputation: 221524

Approach #1

Get the pairwise indices with np.triu_indices, index into rows of a with those and compute the differences -

In [8]: r,c = np.triu_indices(len(a),1)

In [9]: a[c] - a[r]
Out[9]: 
array([[3, 3, 3],
       [6, 6, 6],
       [3, 3, 3]])

Approach #2

We can also use slicing that avoids creating the indices and the indexing part itself that creates copies of the input array for the slices needed for subtraction. Thus, we would work with views only, but we need to iterate in that process. The advantage of slicing shows off for large arrays on performance, as we would verify later on in timings. The implementation would be -

n = len(a)
N = n*(n-1)//2
idx = np.concatenate(( [0], np.arange(n-1,0,-1).cumsum() ))
start, stop = idx[:-1], idx[1:]
out = np.empty((N,a.shape[1]),dtype=a.dtype)
for j,i in enumerate(range(n-1)):
    out[start[j]:stop[j]] = a[i+1:] - a[i,None]

Runtime test

Approaches as funcs -

def pairwise_row_diff_triu_indices(a):
    r,c = np.triu_indices(len(a),1)
    out = a[c] - a[r]
    return out

def pairwise_row_diff_slicing(a):
    n = len(a)
    N = n*(n-1)//2
    idx = np.concatenate(( [0], np.arange(n-1,0,-1).cumsum() ))
    start, stop = idx[:-1], idx[1:]
    out = np.empty((N,a.shape[1]),dtype=a.dtype)
    for j,i in enumerate(range(n-1)):
        out[start[j]:stop[j]] = a[i+1:] - a[i,None]
    return out

Timings -

In [53]: np.random.seed(0)

In [54]: a = np.random.randint(0,9,(1000,3))

In [55]: %timeit pairwise_row_diff_triu_indices(a)
    ...: %timeit pairwise_row_diff_slicing(a)
10 loops, best of 3: 21 ms per loop
100 loops, best of 3: 6.01 ms per loop

In [56]: a = np.random.randint(0,9,(5000,3))

In [57]: %timeit pairwise_row_diff_triu_indices(a)
    ...: %timeit pairwise_row_diff_slicing(a)
1 loop, best of 3: 456 ms per loop
10 loops, best of 3: 110 ms per loop

Upvotes: 3

Sevy
Sevy

Reputation: 698

This is not using numpy or scipy functions...but it is a simple solution

length = a.shape[1]
new_arr = []
for ii in range(length):
    for jj in range(ii+1,length):
        new_arr.append(a[ii,]-a[jj,])

Upvotes: 0

Related Questions