Reputation: 859
Spacy has great parsing capacities and it's API is very intuitive for the most part. Is there any way from the Spacy API to fine tune its word embedding models? In particular, I would like to keep Spacy's tokens and give them a vector when possible.
The only thing I've come across for now is to train the embeddings using gensim (but then I wouldn't know how to load the embeddings from spacy to gensim) and then load then back to spacy, as in: https://www.shanelynn.ie/word-embeddings-in-python-with-spacy-and-gensim/. This doesn't help for the first part: training on spacy tokens.
Any help appreciated.
Upvotes: 1
Views: 1353
Reputation: 61930
From the spacy documentation:
If you need to train a word2vec model, we recommend the implementation in the Python library Gensim.
Besides gensim you can also use other implementations like FastText . The easiest way to use the custom vectors from spacy, is to create a model using the init-model
command-line utility, like this:
wget https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.la.300.vec.gz
python -m spacy init-model en model --vectors-loc cc.la.300.vec.gz
then simply load your model as usual: nlp = spacy.load('model')
. There is a detailed documentation in the spacy website.
Upvotes: 0