replace all words in a big list

Question

I have a list of documents like:

documents = [ 'this is document number 1',
              'this is document number 2',
              'this is document number 3',
                                    ...]]

and a vector of around 200k words: wordVector = ['word1', 'word2'.....'rare_word']

where rare word is the last word in the vector. Also, corresponding to each word in the wordVector, I have a 1x2 vector (so a Nx2 array for the complete wordVec), which are representation of these words.

Now, I want to replace all the words in "document" by their corresponding representations using wordVector and the Nx2 array and if the word is not found, or the document is empty, it is assigned the last values of the NX2 array. Right now I'm using loops and finding the word in the wordVec and then replacing them. As the dataset is huge, the process takes a lot of time. Is there any fast/pythonic way to accomplish this?

dornhege · Accepted Answer

Make it a dictionary and try something like:

replacedWord = wordDict.get(currentWord, 'rare_word')

This should get you the matching replacement entry from the dictionary and will use 'rare_word' if there is no such entry.

replace all words in a big list

Answers (1)

Related Questions