Speed up n-gram processing

Question

I have a uniqueWordList with lots of words (100.000+). Trigrams of every one of those words are in the set allTriGrams.

I want to build a dictionary which has all the unique trigrams as keys and all the words which those trigrams can be matched with as values.

Example:

epicDict = {‘ban’:[‘banana’,’banned’],’nan’:[‘banana’]}

My code so far:

for value in allTriGrams:   
    for word in uniqueWordList:
        if value in word:
            epicDict.setdefault(value,[]).append(word)

My problem: This method takes a LOT of time. Is there any way to speed up this process?

idjaw · Accepted Answer

What if uniqueWordList was a set instead, then you can do this instead:

if value in uniqueWordList:
    epicDict.setdefault(value,[]).append(word)

Answers (2)