JKC
JKC

Reputation: 2618

How to find the difference between two matrices in Python with the result should not have any values with minus sign

I have a Panda Dataframe with two columns (Word and Word_Position) in it. I need to find the distance between words and present the output in matrix form for better readability.

What I have done so far is I have created a row matrix from the DF.Word_Position column and transposed it to create a column matrix. When I subtracted both these matrices, I am getting few values with minus sign before them.

With all due respect to the great mathematics, this is absolutely correct but for my requirement I just need the number and not the minus sign.

Is there any other better way to do the same ? Appreciating your help. Thanks in advance.

Note : I am using Python 3.6

Code snippets and its corresponding output for your reference

m1 = np.matrix(df1['Word Position'])
print(m1)
[[ 1  2  3 ..., 19 20 21]]

m2 = np.matrix(m1.T)
print(m2)
[[ 1]
 [ 2]
 [ 3]
 ..., 
 [19]
 [20]
 [21]]

print(m2-m1)
[[  0  -1  -2 ..., -18 -19 -20]
 [  1   0  -1 ..., -17 -18 -19]
 [  2   1   0 ..., -16 -17 -18]
 ..., 
 [ 18  17  16 ...,   0  -1  -2]
 [ 19  18  17 ...,   1   0  -1]
 [ 20  19  18 ...,   2   1   0]]

Upvotes: 4

Views: 17663

Answers (3)

tupui
tupui

Reputation: 6528

If you want the distance between to arrays, the proper way is to compute the norm:

dists = [np.linalg.norm(m - m2, axis=1) for m in m1[0]]

This assume that shape of the arrays are (n_sample, n_dimension).

Instead of list comprehension, you can do numpy broadcasting on m2


I you want more control on the metric you might want to use scipy.spatial.distance.cdist. This option is faster with large arrays. An example with the minkowski distance (p=2 for Euclidean distance):

dists = [scipy.spatial.distance.cdist(m, m2, 'minkowski', p) for m in m1]

Of course, if the array is only 1D you can achieve that using an absolute value:

dists = np.abs(m1 - m2)

Upvotes: 1

Daniel F
Daniel F

Reputation: 14399

In this case, you probably want to use scipy.spatial.distance.pdist

from scipy.spatial.distance import squareform, pdist
m = df1['Word Position'].data[:, None]
dist = squareform(pdist(m, 'minkowksi', 1))

A bit overkill for this, but extensible if you ever want to change your distance parameter, and usually faster than broadcasting (since it only does half the subtraction steps as abs(a-b) == abs(b-a)). If you want to do broadcasting you could always do this:

dist = np.abs(m - m.T)

Upvotes: 1

Alexander
Alexander

Reputation: 109546

Just take the absolute value?

np.abs(m2 - m1)

Your code indicates that your data consists of numpy arrays, so the solution above should work.

If they are dataframes, you could do:

m2.sub(m1).abs()

Upvotes: 5

Related Questions