Reputation: 4640
In doc2vec
function, there is a parameter called size
.
I understand that, size
is the dimension of output vector, and if size=400
it will capture the content better than if size=100
.
However, I do not understand, what does size
stand for? Does it mean how far Doc2Vec will lookup from a word, to predict the next word? Or what does it mean?
Thanks a lot,
Upvotes: 1
Views: 776
Reputation: 54173
size
is the number of dimensions in the created vectors. So size=100
means each document (actually, document-tag) receives a 100-dimensional vector from training.
More dimensions aren't always better: they mean slower training and a larger model. And if you're working on a small dataset, too many dimensions risks overfitting – preventing the model from representing generalizable patterns in the data.
Upvotes: 1