Abhishek Thakur
Abhishek Thakur

Reputation: 17015

replace all words in a big list

I have a list of documents like:

documents = [ 'this is document number 1',
              'this is document number 2',
              'this is document number 3',
                                    ...]]

and a vector of around 200k words: wordVector = ['word1', 'word2'.....'rare_word']

where rare word is the last word in the vector. Also, corresponding to each word in the wordVector, I have a 1x2 vector (so a Nx2 array for the complete wordVec), which are representation of these words.

Now, I want to replace all the words in "document" by their corresponding representations using wordVector and the Nx2 array and if the word is not found, or the document is empty, it is assigned the last values of the NX2 array. Right now I'm using loops and finding the word in the wordVec and then replacing them. As the dataset is huge, the process takes a lot of time. Is there any fast/pythonic way to accomplish this?

Upvotes: 1

Views: 119

Answers (1)

dornhege
dornhege

Reputation: 1500

Make it a dictionary and try something like:

replacedWord = wordDict.get(currentWord, 'rare_word')

This should get you the matching replacement entry from the dictionary and will use 'rare_word' if there is no such entry.

Upvotes: 3

Related Questions