Reputation: 1126
Usually we start from:
nlp = spacy.load('en_encore_web_sm') # or medium, or large
or
nlp = English()
then:
doc = nlp('my text')
Then we can do a lot of fun with that even not knowing the nature of the first line.
But what exactly is 'nlp'? What is going on under the hood? Is "nlp" a pretrained model, as understood in machine learning, and therefore some big file located somewhere on the disc?
I met an explanation, that 'nlp' is an 'object, containing process pipeline', but that only explains a little.
Upvotes: 2
Views: 374
Reputation: 396
nlp
is a spaCy pipeline. You can see the details on it here: https://spacy.io/models/en#en_core_web_sm
Pipelines contain multiple components, in this case:
Hope this helps. There's more details in the documentation on Pipelines here: https://spacy.io/usage/processing-pipelines
Upvotes: 0
Reputation:
You could infer what nlp()
is by exploring it. For example:
import spacy
from spacy import displacy
nlp = spacy.load("en_core_web_lg")
text = "Elon Musk 889-888-8888 [email protected] Jeff Bezos (345)123-1234 [email protected] Reshma Saujani [email protected] 888-888-8888 Barkevious Mingo"
text = nlp(text)
print(text)
Will print the exact same text. On the other hand if you do:
for word in text.ents:
print(word.text,word.label_)
you will get the entities of the string:
Elon Musk PERSON
889-888 CARDINAL
Jeff Bezos PERSON
345)123 CARDINAL
Reshma Saujani PERSON
It is indeed large pre-trained model for the English language and has many functions (parser, lemmatizer, tagger) as the one demonstrated above. Hope this helps a bit to clarify your question.
Upvotes: 0
Reputation: 2510
You can always check the type of any python objects:
nlp = spacy.load('en_encore_web_sm') # or medium, or large
print(type(nlp))
print(dir(nlp)) # view a list of attributes
You will get something like this (depending on the passed arguments)
<class 'spacy.lang.en.English'>
You are right it is something like 'pretrained' model as it contains vocabulary, binary weights, etc.
Please check the official documentation:
Upvotes: 2