Reputation:
I know to obtain a document vector for a given tag in doc2vec using print(model.docvecs['recipe__11'])
.
My document vectors are either recipes (tags start with recipe__
), newspapers (tags start with news__
) or ingredients (tags start with ingre__
)
Now I want to retrieve all the document vectors of recipes. The pattern of my recipe documents is recipe__<some number>
(e.g., recipe__23, recipe__34). I am interested in knowing if it possible to obtain multiple document vectors using a pattern (e.g., tags starting with recipe__
)
Please help me!
Upvotes: 2
Views: 2330
Reputation: 54173
There's no pattern-retrieval, but you can access the list of all known (string) doc-tags in model.docvecs.offset2doctag
. You could then loop over that list to find all matches, and retrieve each individually.
Also, all the doc-vectors are in a large array model.docvecs.doctag_syn0
And, if you've used exclusively string doc-tags, then the position of a tag in offset2doctag
will be exactly the index of the corresponding vector in doctag_syn0
. That would allow you to use numpy 'mask indexing' to grab a subset of vectors as a new array, like:
recipes_mask = [tag.startswith('recipe_') for tag in model.dacvecs.offset2doctag]
recipes_vectors = model.docvecs.doctag_syn0[recipes_mask]
Of course, this array-of-vectors no longer has the recipes in the same positions as the original, so you'd need extra steps to know where (for example) the 'recipe__11' vector is in recipes_vectors
.
Upvotes: 6