How to add missing words vectors in GoogleNews-vectors-negative300.bin pre-trained model?

Question

I am using gensim word2vec library in python and using pre-trained GoogleNews-vectors-negative300.bin model. But,

I have words in my corpus for which i don't have word vectors and am getting keyError for that how do i solve this problem?

Here is what i have tried so far,

1: Loading `GoogleNews-vectors-negative300.bin` per-trained model:

model = Word2Vec.load_word2vec_format('GoogleNews-vectors-negative300.bin', binary=True)
print "model loaded..."

2: Build word vector for training set by using the average value of all word vectors in the tweet, then scale

def buildWordVector(text, size):
vec = np.zeros(size).reshape((1, size))
count = 0.
for word in text:
    try:
        vec += model[word].reshape((1, size))
        count += 1.
        #print "found! ",  word
    except KeyError:
        print "not found! ",  word #missing words
        continue
if count != 0:
    vec /= count
return vec

trained_vecs = np.concatenate([buildWordVector(z, n_dim) for z in x_train])

Please tell how it is possible to add new words in pre-trained Word2vec model?

How to add missing words vectors in GoogleNews-vectors-negative300.bin pre-trained model?

Here is what i have tried so far,

1: Loading `GoogleNews-vectors-negative300.bin` per-trained model:

2: Build word vector for training set by using the average value of all word vectors in the tweet, then scale

Answers (1)

Related Questions

How to add missing words vectors in GoogleNews-vectors-negative300.bin pre-trained model?

Here is what i have tried so far,

1: Loading GoogleNews-vectors-negative300.bin per-trained model:

2: Build word vector for training set by using the average value of all word vectors in the tweet, then scale

Answers (1)

Related Questions

1: Loading `GoogleNews-vectors-negative300.bin` per-trained model: