Reputation: 159
I always used spacy library with english or german.
To load the library I used this code:
import spacy
nlp = spacy.load('en')
I would like to use the Spanish tokeniser, but I do not know how to do it, because spacy does not have a spanish model. I've tried this
python -m spacy download es
and then:
nlp = spacy.load('es')
But obviously without any success.
Does someone know how to tokenise a spanish sentence with spanish in the proper way?
Upvotes: 5
Views: 6641
Reputation: 2150
You will have to download a spanish language model ("es" for Spanish, 'md' = medium model size, 'sm' = small model size) using the command line. Currently two pretrained Spanish models are available:
Choose the small or medium sized version and download them using the command line:
python -m spacy download es_core_news_sm
python -m spacy download es_core_news_md
Then load the model of choice in python using the name of the model:
import spacy
nlp = spacy.load("es_core_news_sm") # or spacy.load("es_core_news_md")
# do something with the model, e.g. tokenize the text
doc = nlp(text_in_spanish)
for token in doc:
print(token.text)
Check the documentation for model updates: https://spacy.io/models/es
Upvotes: 1
Reputation: 868
This works for me:
python -m spacy download es_core_news_sm
import spacy
nlp = spacy.load("es_core_news_sm")
Upvotes: 0
Reputation: 159
For version till 1.6 this code works properly:
from spacy.es import Spanish
nlp = Spanish()
but in version 1.7.2 a little change is necessary:
from spacy.es import Spanish
nlp = Spanish(path=None)
Source:@honnibal in gitter chat
Upvotes: 6