Reputation: 65
The following function return the number of words from a list that contain the exact same characters as the word entered. The order of the characters in the words is not important. However, say there is a list that contain millions of words. What is the most efficient and fastest way to perform this search?
Example:
words_list = ['yek','lion','eky','ekky','kkey','opt'];
if we were to match the word "key" with the words in the list, the function only return "yek" and "eky" since they share the same exact characters with "key" regardless of the order.
Below is the function I wrote
def find_a4(words_list, word):
# all possible permutations of the word that we are looking for
# it's a set of words
word_permutations = set([''.join(p) for p in permutations(word)])
word_size = len(word)
count = 0
for word in word_list:
# in the case of word "key",
# we only accept words that have 3 characters
# and they are in the word_permutations
if len(word) == word_size and word in word_permutations:
count += 1
return count
Upvotes: 3
Views: 1491
Reputation: 18950
A dictionary whose key is the sorted version of the word:
word_list = ['yek','lion','eky','ekky','kkey','opt']
from collections import defaultdict
word_index = defaultdict(set)
for word in word_list:
idx = tuple(sorted(word))
word_index[idx].add(word)
# word_index = {
# ('e', 'k', 'y'): {'yek', 'eky'},
# ('i', 'l', 'n', 'o'): {'lion'},
# ('e', 'k', 'k', 'y'): {'kkey', 'ekky'},
# ('o', 'p', 't'): {'opt'}
# }
Then for querying you would do:
def find_a4(word_index, word):
idx = tuple(sorted(word))
return len(word_index[idx])
Or if you need to return the actual words, change it to return word_index[idx]
.
Efficiency: querying runs in average in O(1) time.
Upvotes: 5
Reputation: 2103
For large string, you will have n!
permutations to search. I will sort all the strings before comparison, this will be nlog(n), and will sort and compare only when lengths match -
def find_a4(words_list, word):
word = ''.join(sorted(word))
word_size = len(word)
count = 0
for word1 in words_list:
if len(word1) == word_size:
if word == ''.join(sorted(word1)):
count += 1
return count
Upvotes: 2