Reputation: 1596
I want to calculate the pairwise hamming distance of a 2D numpy array.
My arrays is
A
array([[-1, 0, -1, 0, -1, 0],
[ 1, 0, 0, 0, 0, 0],
[ 0, 0, 1, 1, 1, 0],
[ 0, 0, -1, 1, 0, 0],
[ 0, 0, 0, 0, -1, 0]], dtype=int8)
I want to calculate the hamming distance between the rows of A, but considering only non-zero values. If one of the entry is zero, we dont include it in calculation.
My output should be
B
array([[0, 1, 2, 0, 0],
[1, 0, 0, 0, 0],
[2, 0, 0, 1, 1],
[0, 0, 1, 0, 0],
[0, 0, 1, 0, 0]], dtype=int8)
Upvotes: 2
Views: 770
Reputation: 114280
If your arrays only have zeros and ones, then you have the following property: r1 * r2
will contain 0 in missing locations, -1 where elements differ, and +1 where they are the same. You therefore want to multiply all possible combinations together, and count the number of entries less than zero for each row.
You take the product with broadcasting:
B = np.count_nonzero(A[:, None, :] * A[None, :, :] < 0, axis=-1)
If you need to generalize for values that are not always -1 and +1, you can use a similar trick to explicitly check for equality. For two items a
, b
, the quantity a * b * (a - b)
will be non-zero if and only if both quantities are non-zero and different:
A1 = A[:, None, :]
A2 = A[None, :, :]
B = np.count_nonzero(A1 * A2 * (A1 - A2), axis=-1)
If you want to write the condition out explicitly, you can do
np.count_nonzero((A1 != A2) & (A1 != 0) & (A2 != 0), axis=-1)
Upvotes: 2
Reputation: 12397
I have a feeling there should be an easier way for this (in terms of speed it should be fine since everything is array based, readability is a little hard). But here is a working solution:
from itertools import permutations
b = np.zeros((a.shape[0], a.shape[0]))
idx = np.array(list(permutations(range(a.shape[0]),2)))
b[tuple(idx.T)] = np.count_nonzero(np.logical_and(a[idx.T][0,:]-a[idx.T][1,:], np.logical_and(a[idx.T][0,:]!=0, a[idx.T][1,:]!=0)), axis=1)
You first create all possible combinations of rows using itertools
' permutations
as indices and then for each pair of rows, count nonzero values in logical and of subtract of them and nonzero values of them:
output:
[[0. 1. 2. 0. 0.]
[1. 0. 0. 0. 0.]
[2. 0. 0. 1. 1.]
[0. 0. 1. 0. 0.]
[0. 0. 1. 0. 0.]]
Upvotes: 1