Simd
Simd

Reputation: 21333

How to filter out words with at least one unique letter

I have long list of strings called words. All the strings have the same length which is 5. For any i in {0,...,4} we can check to see if the i'th letter of a string is also the i'th letter of any other string in the list. We call a letter at the i'th position unique if it is not.

I would like to remove all strings from my list for which there exists any i for which the i'th letter is unique.

As a very simple example consider: words = ["apple", "amber", "bpple", "bmber", "appld"]. The string "appld" should be removed because d is unique at the 4th position.

Is there a neat way to do this?

Upvotes: 1

Views: 187

Answers (2)

Bing Wang
Bing Wang

Reputation: 1598

this can be done in O(NlogN) but space inefficient

d={}
for j,w in enumerate(words):
    for i,c in enumerate(w):
        if (i,c) in d:
            d[(i,c)].append(j)
        else:
            d[(i,c)]=[j]
for i in reversed(list(set([v[0] for v in d.values() if len(v)==1]))):
    words.pop(i)
print(words)

Upvotes: 1

user2390182
user2390182

Reputation: 73480

Using collections.Counter and the zip(*...) transposition idiom:

from collections import Counter

# counts for every index
bypos = [*map(Counter, zip(*words))] 

# disallow count 1 for any letter x in its respective position
words = [w for w in words if all(c[x]!=1 for x, c in zip(w, bypos))]
# ['apple', 'amber', 'bpple', 'bmber']

Note that is better to rebuild the list in a single iteration than to remove elements repeatedly.

Some docs on the utils used here:

Upvotes: 4

Related Questions