Reputation: 426
I am trying to train my model with data which is 50 mb in size . I was just wondering if there is a rule/algorithm for determining the size of the dimension for the algorithm.
Upvotes: 2
Views: 388
Reputation: 22634
I would assume a 50mb text file as about 500,000 sentences or 5 million tokens. It's way too small to train a meaningful embedding however here is the empirical data (trained on 6Billion tokens) that you could refer to.
Source: https://nlp.stanford.edu/pubs/glove.pdf
Upvotes: 1