Niels Helsø
Niels Helsø

Reputation: 45

How to create a DataFrame with the word2ve vectors as data, and the terms as row labels?

I tried to follow this documentation: nbviewer.jupyter.org/github/skipgram/modern-nlp-in-python/blob/master/executable/Modern_NLP_in_Python.ipynb Where I have the following code snippet:

ordered_vocab = [(term, voc.index, voc.count)
             for term, voc in food2vec.vocab.iteritems()]

ordered_vocab = sorted(ordered_vocab, key=lambda (term, index, count): -count)

ordered_terms, term_indices, term_counts = zip(*ordered_vocab)

word_vectors = pd.DataFrame(food2vec.syn0norm[term_indices, :],
                        index=ordered_terms

To get it to run i have change it to following:

ordered_vocab = [(term, voc.index, voc.count)
             for term, voc in word2vecda.wv.vocab.items()]
ordered_vocab = sorted(ordered_vocab)
ordered_terms, term_indices, term_counts = zip(*ordered_vocab)
word_vectorsda = pd.DataFrame(word2vecda.wv.syn0norm[term_indices,],index=ordered_terms)
word_vectorsda [:20]

But the last line before I print the DataFrame give me an error I cannot get my head around. It keeps return that the noneType object cannot be in this line. To me, it looks like it is Term_indices there tracking it, but I do not get why?

 TypeError: 'NoneType' object is not subscriptable

Can any help me with this? Any inputs are most welcome Best Niels

Upvotes: 0

Views: 3061

Answers (1)

Harman
Harman

Reputation: 1208

Use the following code:

ordered_vocab = [(term, voc.index, voc.count) for term, voc in model.wv.vocab.items()]
ordered_vocab = sorted(ordered_vocab, key=lambda k: k[2])
ordered_terms, term_indices, term_counts = zip(*ordered_vocab)
word_vectors = pd.DataFrame(model.wv.syn0[term_indices, :], index=ordered_terms)

Replace model with food2vec.
Working on python 3.6.1, gensim '3.0.0'

Upvotes: 3

Related Questions