Basudev
Basudev

Reputation: 135

Spacy EN Model issue

Need to know the difference between spaCy's en and en_core_web_sm model.

I am trying to do NER with Spacy.( For Organization name) Please find bellow the script I am using

import spacy
nlp = spacy.load("en_core_web_sm")
text = "But Google is starting from behind. The company made a late push \
    into hardware, and Apple’s Siri, available on iPhones, and Amazon’s \ 
    Alexa software, which runs on its Echo and Dot devices, have clear 
    leads in consumer adoption."
doc = nlp(text)
for ent in doc.ents:
    print(ent.text, ent.start_char, ent.end_char, ent.label_)

And above providing me no output. But when I use “en” model

import spacy
nlp = spacy.load("en")
text = "But Google is starting from behind. The company made a late push \
    into hardware, and Apple’s Siri, available on iPhones, and Amazon’s \
    Alexa software, which runs on its Echo and Dot devices, have clear 
    leads in consumer adoption."
doc = nlp(text)
for ent in doc.ents:
    print(ent.text, ent.start_char, ent.end_char, ent.label_)

it provides me desired output: Google 4 10 ORG Apple’s Siri 92 104 ORG iPhones 119 126 ORG Amazon 132 138 ORG Echo and Dot 182 194 ORG

What is going wrong in this? Please help.

can I use en_core_web_sm model to have the same output like en model. if so please advice how to do it. Python 3 script with pandas df as input are solicited. Thanks

Upvotes: 1

Views: 890

Answers (3)

sumeet siddhartha
sumeet siddhartha

Reputation: 16

Loading spacy.load('en_core_web_sm') instead of spacy.load('en') should help.

Upvotes: 0

Tiago Duque
Tiago Duque

Reputation: 2079

So each model is a Machine Learning model trained on top of a specific corpus (a text 'dataset'). This makes it so that each model can tag entries differently - especially because some models were trained on less data than others.

Currently Spacy offers 4 models for english, as presented in: https://spacy.io/models/en/

According to https://github.com/explosion/spacy-models, a model can be downloaded in several distinct ways:

# download best-matching version of specific model for your spaCy installation
python -m spacy download en_core_web_sm

# out-of-the-box: download best-matching default model
python -m spacy download en

Probably, when you downloaded the 'en' model, the best matching default model was not 'en_core_web_sm'.

Also, keep in mind that these models are updated every once in a while, which may have caused you to have two different versions of the same model.

Upvotes: 2

Chandan Gupta
Chandan Gupta

Reputation: 722

In my system result are same in both caseenter image description here

Code:-

import spacy
nlp = spacy.load("en_core_web_sm")
text = """But Google is starting from behind. The company made a late push 
into hardware, and Apple’s Siri, available on iPhones, and Amazon’s  
Alexa software, which runs on its Echo and Dot devices, have clear 
leads in consumer adoption."""
doc = nlp(text)
for ent in doc.ents:
   print(ent.text, ent.start_char, ent.end_char, ent.label_)

import spacy
nlp = spacy.load("en")
text = """But Google is starting from behind. The company made a late push \
into hardware, and Apple’s Siri, available on iPhones, and Amazon’s \
Alexa software, which runs on its Echo and Dot devices, have clear 
leads in consumer adoption."""
doc = nlp(text)
for ent in doc.ents:
    print(ent.text, ent.start_char, ent.end_char, ent.label_)

Upvotes: 0

Related Questions