robinhood91
robinhood91

Reputation: 1691

Find equivalent words in a list through arbitrary mapping

Imagine you have a list of words:

['cat', 'ant', 'bro', 'gro']

Using some arbitrary mapping that we construct ourselves {'c'=>'a', 'a'=>'n', 't'=>'t' }, we can map 'cat' to 'ant', and similarly we can find some arbitrary mapping to convert 'bro' to 'gro'.

This is the idea of finding words that are equivalent. I wrote a function that compares two words, and checks if they are equivalent through a mapping I construct on the fly:

def compareWords(w1, w2):
   mapping = {}
   for i in xrange(0, len(w1)):
       if w1[i] in map:
           if mapping[w1[i]] == w2[i]:
               continue
           else:
               return False
        else:
           mapping[w1[i]] = w2[i]

    return True

Example input:

['cat', 'ant', 'bro', 'gro']

Example output:

[['cat','ant'], ['bro', 'gro']]

Using this function on each pair of words in the list to return an output list of lists of equivalent words runs in O(n^2) since every pair will need to be compared. I haven't implemented this part where I use this method above on the input list and generate the output list, since this method isn't the efficient comparison I am looking for. Is there a way to find equivalent words in this input list in O(n) time?

Further Explanation: If I have a list of words, and I want to find all the "equivalent" words, I would need to make the checks in pairs of words. If all letters of the words I am comparing are unique, then another word in the list is only equivalent to this first word if all the letters in the second word are also unique. so abc can be mapped to xyz if xyz is in the list. xyz can be mapped to pqr if xyz is in the list. Given this, abc, xyz, and pqr are all equivalent. This is the kind of grouping I am looking for.

Upvotes: 3

Views: 300

Answers (2)

Matt Jordan
Matt Jordan

Reputation: 2181

If I understand your request, you want to group words such that every relation could be unique, not necessarily that it is unique. Using your examples, mom ~ dad ~ bab, but bad couldn't exist within that set since no mapping that can map from mom to dad (m->d, o->a) or dad to bab (d->b, a->a) can map to bad (for mom, m->b AND d? for dad, d to b once and skip the next?).

Assuming that your grouping is really commutative, then as soon as you have grouped a word, you should never have to revisit it, except to check against the first word of each group.

So, you would start by adding your first word to your first group. Then, for each additional word, you would need to compare it with the first word in each existing group - if it matches, you add it to the group; if it doesn't match any groups, you add it to a new group of its own.

In the worst case, this is O(N**2), if every word in your set belongs in its own group. In the best case, if all words in your set end up in the first group, it would be O(N), since you would only compare the first word in the only group to each additional word. If you have a log(N) distribution of sets, this algorithm is effectively O(N log(N)). So, it depends on your input set, but it will result in far fewer comparisons than if you check every pair.

Upvotes: 0

georg
georg

Reputation: 214959

If I understood correctly, you're looking for a way to check if an arbitrary relation, given as a list of pairs x,y is a function, that is, x1==x2 implies y1==y2. This can be done easily with sets:

def is_function(rel):
    return len(set(rel)) == len(set(x for x, y in rel))


print is_function(['ab', 'cd', 'xd']) # yes
print is_function(['ab', 'cd', 'ad']) # no

Two words are "equivalent" in the sense of your question if their letter-to-letter relation is a function:

def equivalent(a, b):
    return is_function(zip(a, b))

print equivalent('foo', 'baa') # yes
print equivalent('foo', 'bar') # no

If you consider equivalences between different words as different relations, there's no way you can avoid comparing each one with each. Moreover, your "equivalence" isn't even commutative, A ~ B doesn't imply B ~ A (e.g abc ~ xyx, but xyx !~ abc).

From your comment, your relation turns out to be bijective (note: your code is not correct for this case). The simplest (not necessarily the most efficient) way to split the list into "equivalence classes" would be to calculate a "hash" for each word, replacing letters with numbers, where 0 stands for the first letter, 1 for the second etc:

def eq_hash(word):
    return tuple(word.index(c) for c in word)

print eq_hash('mom') # 0 1 0
print eq_hash('dad') # 0 1 0

Now, you can group together all words that have the same "hash". These will be equivalent in the context of your question:

group = {}

words = ['mom', 'dad', 'aaa', 'bob', 'ccc', 'xyz', 'abc']

for w in words:
    h = eq_hash(w)
    group[h] = group.get(h, []) + [w]

print group.values()
# [['xyz', 'abc'], ['mom', 'dad', 'bob'], ['aaa', 'ccc']]

Upvotes: 3

Related Questions