Reputation: 75
I have a uniqueWordList with lots of words (100.000+). Trigrams of every one of those words are in the set allTriGrams.
I want to build a dictionary which has all the unique trigrams as keys and all the words which those trigrams can be matched with as values.
Example:
epicDict = {‘ban’:[‘banana’,’banned’],’nan’:[‘banana’]}
My code so far:
for value in allTriGrams:
for word in uniqueWordList:
if value in word:
epicDict.setdefault(value,[]).append(word)
My problem: This method takes a LOT of time. Is there any way to speed up this process?
Upvotes: 3
Views: 159
Reputation: 4492
Among simple solutions, I expect this to be faster:
epicDict = collections.defaultdict(set)
for word in uniqueWordList:
for trigram in [word[x:x+3] for x in range(len(word)-2)]:
epicDict[trigram].add(word)
Upvotes: 0
Reputation: 26578
What if uniqueWordList was a set instead, then you can do this instead:
if value in uniqueWordList:
epicDict.setdefault(value,[]).append(word)
Check this out: Python Sets vs Lists
Upvotes: 2