gaurus
gaurus

Reputation: 426

what should be the dimension of vectors for word2vec algorithm for 50 mb data

I am trying to train my model with data which is 50 mb in size . I was just wondering if there is a rule/algorithm for determining the size of the dimension for the algorithm.

Upvotes: 2

Views: 388

Answers (1)

aerin
aerin

Reputation: 22634

I would assume a 50mb text file as about 500,000 sentences or 5 million tokens. It's way too small to train a meaningful embedding however here is the empirical data (trained on 6Billion tokens) that you could refer to.

enter image description here

Source: https://nlp.stanford.edu/pubs/glove.pdf

Upvotes: 1

Related Questions