Reputation: 13
I am trying to write hamming distance.
As input I do have two matrices M1 and M2, one is 40x20, second 50x20; they contain True/False. I need to calc distance between each row, so M1[0] distance with M2[0], M1[0] with M2[1] ... M1[39] with M2[49]. Resulting in 40x50 result matrix. My first try was of course looping like:
for x_i in range(X.shape[0]):
for x_train_j in range(X_train.shape[0]):
distance_results[x_i, x_train_j] = sum(np.logical_xor(x_array[x_i, :], x_train_array[x_train_j, :]))
it is correct, however, it is too slow for my data. I have some thoughts about multiplying M1.dot(M2.T)
, which gives right shape of matrix and does summing in one step but when multiplying and summing xor is desired so:
1 * 1 = 1
(need 0 - bad)
1 * 0 = 0
(need 1 - bad)
0 * 1 = 0
(need 1 - bad)
0 * 0 = 0
(need 0 - is OK)
Do you have any idea how can I get desired result? I think i miss some math to do it right and fast.
Thanks in advance.
Upvotes: 1
Views: 490
Reputation: 4332
M1.dot(M2.T)
could be in a way considered a AND
for Boolean logic. For binary logic A xor B
is equivalent to (A and not B) or (not A and B)
so therefore one thing you could do --
M1.dot(M2.T):
1 * 1 = 1
1 * 0 = 0
0 * 1 = 0
0 * 0 = 0
(not M1).dot(M2.T):
0 * 1 = 0
0 * 0 = 0
1 * 1 = 1
1 * 0 = 0
M1.dot(not M2.T):
1 * 0 = 0
1 * 1 = 1
0 * 0 = 0
0 * 1 = 0
(not M1).dot(M2.T) or M1.dot(not M2.T):
0 * 1 || 1 * 0 = 0
0 * 0 || 1 * 1 = 1
1 * 1 || 0 * 0 = 1
1 * 0 || 0 * 1 = 0
To code it up you need to be careful to cast out of logical before the dot so that the sum is calculated. If you dot on 2 logical arrays you wont get the result you want. I just use -1 as my not operator ( True - 1 = 0, False - 1 = -1)
. Also the 'or' here is really a sum since you are considering multiple bools. At the end you end up with your distances in the negative, easy to fix with absolute.
_A = [[ True, True, False],
[ True, False, True]]
_B = [[ True, False, False],
[ False, False, False],
[ True, True, True ]]
Expected output: [[ 1, 2, 1], [ 1, 2, 1]]
A = numpy( _A, dtype=bool)
B = numpy( _B, dtype=bool)
numpy.absolute(numpy.dot(A,B.T-1) + numpy.dot(A-1, B.T))
>>>array([[1, 2, 1],
[1, 2, 1]])
Upvotes: 1