Reputation: 135
Need to know the difference between spaCy's en and en_core_web_sm model.
I am trying to do NER with Spacy.( For Organization name) Please find bellow the script I am using
import spacy
nlp = spacy.load("en_core_web_sm")
text = "But Google is starting from behind. The company made a late push \
into hardware, and Apple’s Siri, available on iPhones, and Amazon’s \
Alexa software, which runs on its Echo and Dot devices, have clear
leads in consumer adoption."
doc = nlp(text)
for ent in doc.ents:
print(ent.text, ent.start_char, ent.end_char, ent.label_)
And above providing me no output. But when I use “en” model
import spacy
nlp = spacy.load("en")
text = "But Google is starting from behind. The company made a late push \
into hardware, and Apple’s Siri, available on iPhones, and Amazon’s \
Alexa software, which runs on its Echo and Dot devices, have clear
leads in consumer adoption."
doc = nlp(text)
for ent in doc.ents:
print(ent.text, ent.start_char, ent.end_char, ent.label_)
it provides me desired output: Google 4 10 ORG Apple’s Siri 92 104 ORG iPhones 119 126 ORG Amazon 132 138 ORG Echo and Dot 182 194 ORG
What is going wrong in this? Please help.
can I use en_core_web_sm model to have the same output like en model. if so please advice how to do it. Python 3 script with pandas df as input are solicited. Thanks
Upvotes: 1
Views: 890
Reputation: 16
Loading spacy.load('en_core_web_sm')
instead of spacy.load('en')
should help.
Upvotes: 0
Reputation: 2079
So each model is a Machine Learning model trained on top of a specific corpus (a text 'dataset'). This makes it so that each model can tag entries differently - especially because some models were trained on less data than others.
Currently Spacy offers 4 models for english, as presented in: https://spacy.io/models/en/
According to https://github.com/explosion/spacy-models, a model can be downloaded in several distinct ways:
# download best-matching version of specific model for your spaCy installation
python -m spacy download en_core_web_sm
# out-of-the-box: download best-matching default model
python -m spacy download en
Probably, when you downloaded the 'en' model, the best matching default model was not 'en_core_web_sm'.
Also, keep in mind that these models are updated every once in a while, which may have caused you to have two different versions of the same model.
Upvotes: 2
Reputation: 722
In my system result are same in both case
Code:-
import spacy
nlp = spacy.load("en_core_web_sm")
text = """But Google is starting from behind. The company made a late push
into hardware, and Apple’s Siri, available on iPhones, and Amazon’s
Alexa software, which runs on its Echo and Dot devices, have clear
leads in consumer adoption."""
doc = nlp(text)
for ent in doc.ents:
print(ent.text, ent.start_char, ent.end_char, ent.label_)
import spacy
nlp = spacy.load("en")
text = """But Google is starting from behind. The company made a late push \
into hardware, and Apple’s Siri, available on iPhones, and Amazon’s \
Alexa software, which runs on its Echo and Dot devices, have clear
leads in consumer adoption."""
doc = nlp(text)
for ent in doc.ents:
print(ent.text, ent.start_char, ent.end_char, ent.label_)
Upvotes: 0