Luca Ambrosini
Luca Ambrosini

Reputation: 159

Use spacy Spanish Tokenizer

I always used spacy library with english or german.

To load the library I used this code:

import spacy
nlp = spacy.load('en')

I would like to use the Spanish tokeniser, but I do not know how to do it, because spacy does not have a spanish model. I've tried this

python -m spacy download es

and then:

nlp = spacy.load('es')

But obviously without any success.

Does someone know how to tokenise a spanish sentence with spanish in the proper way?

Upvotes: 5

Views: 6641

Answers (3)

lux7
lux7

Reputation: 2150

You will have to download a spanish language model ("es" for Spanish, 'md' = medium model size, 'sm' = small model size) using the command line. Currently two pretrained Spanish models are available:

  • es_core_news_sm
  • es_core_news_md

Choose the small or medium sized version and download them using the command line:

python -m spacy download es_core_news_sm
python -m spacy download es_core_news_md

Then load the model of choice in python using the name of the model:

import spacy
nlp = spacy.load("es_core_news_sm") # or spacy.load("es_core_news_md")

# do something with the model, e.g. tokenize the text
doc = nlp(text_in_spanish)
for token in doc:
   print(token.text)

Check the documentation for model updates: https://spacy.io/models/es

Upvotes: 1

Ofer Rahat
Ofer Rahat

Reputation: 868

This works for me:

python -m spacy download es_core_news_sm


import spacy
nlp = spacy.load("es_core_news_sm")

Upvotes: 0

Luca Ambrosini
Luca Ambrosini

Reputation: 159

For version till 1.6 this code works properly:

from spacy.es import Spanish
nlp = Spanish()

but in version 1.7.2 a little change is necessary:

from spacy.es import Spanish
nlp = Spanish(path=None)

Source:@honnibal in gitter chat

Upvotes: 6

Related Questions