Reputation: 17015
I have a list of documents like:
documents = [ 'this is document number 1',
'this is document number 2',
'this is document number 3',
...]]
and a vector of around 200k words: wordVector = ['word1', 'word2'.....'rare_word']
where rare word is the last word in the vector. Also, corresponding to each word in the wordVector, I have a 1x2
vector (so a Nx2
array for the complete wordVec), which are representation of these words.
Now, I want to replace all the words in "document" by their corresponding representations using wordVector
and the Nx2
array and if the word is not found, or the document is empty, it is assigned the last values of the NX2
array. Right now I'm using loops and finding the word in the wordVec and then replacing them. As the dataset is huge, the process takes a lot of time. Is there any fast/pythonic way to accomplish this?
Upvotes: 1
Views: 119
Reputation: 1500
Make it a dictionary and try something like:
replacedWord = wordDict.get(currentWord, 'rare_word')
This should get you the matching replacement entry from the dictionary and will use 'rare_word' if there is no such entry.
Upvotes: 3