user2258651
user2258651

Reputation: 393

Computing Inverse hamming distance among a list of words

I'd like to find the number of the same characters in the same position in each word of a list of words. So for example the end result would be a matrix of the words compared to other words in the list which shows inverse hamming distance between the two like so:Inverse Hamming Distance Matrix

Given that hamm_dist(a,b) = hamm_dist(b,a) I only really need to compute to the right of the diagonal. Is there a more efficient way to find these values than just a bunch of calls to compute the distance between any two?

Upvotes: 1

Views: 275

Answers (1)

cobarzan
cobarzan

Reputation: 682

If we are talking about the complexity of filling such a matrix, obviously you cannot do better than O(n^2), where n is the number of words. At least not in the worst case scenario.

But here is another solution that requires less char-to-char comparisons than the solution you are suggesting.

You assign an index to each word, thus creating tuples of the form (<index>, <word>). Also let L be the maximum length of a word in your input and M be your output matrix.

set all elements in M to 0
for i = 1 to L do
    sort the tuples using the ith character of the words as key
    for every pair w1 w2 of words having the same ith letter do
        M[w1,w2]++
    endfor
endfor

In other words, for each index position, sort the words using the character on that position as key and increase the counter for all the word pairs that have the same value on that position. As I assume that your alphabet is not very wide, you can use count sort. In fact, you do not really need to sort, but rather to place each word (actually the index you've assigned to the word) in the corresponding bucket (one bucket for each possible letter in your alphabet).

Complexity wise, this solution requires O(L*n) char-to-char "comparisons" and O(S) increase-by-one operations, where S is the sum of all the Hamming distances. Your solution seems to take O(L*n^2) char-to-char comparisons and O(S) increase-by-one operations. My "comparisons" are not actual comparisons, but just interrogations of the ith position in a word.

Upvotes: 1

Related Questions