Andrew Anderson
Andrew Anderson

Reputation: 1126

What is nlp in spacy?

Usually we start from:

nlp = spacy.load('en_encore_web_sm') # or medium, or large

or

nlp = English()

then:

doc = nlp('my text')

Then we can do a lot of fun with that even not knowing the nature of the first line.

But what exactly is 'nlp'? What is going on under the hood? Is "nlp" a pretrained model, as understood in machine learning, and therefore some big file located somewhere on the disc?

I met an explanation, that 'nlp' is an 'object, containing process pipeline', but that only explains a little.

Upvotes: 2

Views: 374

Answers (3)

NLP from scratch
NLP from scratch

Reputation: 396

nlp is a spaCy pipeline. You can see the details on it here: https://spacy.io/models/en#en_core_web_sm

Pipelines contain multiple components, in this case:

  • tok2vec: Token-to-Vector model for tokenizing and vectorizing text
  • tagger: Part-of-speech (POS) tagger
  • parser: Dependency parser
  • attribute_ruler: Attribute mapping based on rules
  • lemmatizer: Lemmatization (base forms of words)
  • ner: Named entity recognition

Hope this helps. There's more details in the documentation on Pipelines here: https://spacy.io/usage/processing-pipelines

Upvotes: 0

anon
anon

Reputation:

You could infer what nlp() is by exploring it. For example:

import spacy
from spacy import displacy

nlp = spacy.load("en_core_web_lg")

text = "Elon Musk 889-888-8888 [email protected] Jeff Bezos (345)123-1234 [email protected] Reshma Saujani [email protected] 888-888-8888 Barkevious Mingo"

text = nlp(text)

print(text)

Will print the exact same text. On the other hand if you do:

for word in text.ents:
    print(word.text,word.label_)

you will get the entities of the string:

Elon Musk PERSON
889-888 CARDINAL
Jeff Bezos PERSON
345)123 CARDINAL
Reshma Saujani PERSON

It is indeed large pre-trained model for the English language and has many functions (parser, lemmatizer, tagger) as the one demonstrated above. Hope this helps a bit to clarify your question.

Upvotes: 0

u1234x1234
u1234x1234

Reputation: 2510

You can always check the type of any python objects:

nlp = spacy.load('en_encore_web_sm') # or medium, or large
print(type(nlp))
print(dir(nlp))  # view a list of attributes

You will get something like this (depending on the passed arguments)

<class 'spacy.lang.en.English'>

You are right it is something like 'pretrained' model as it contains vocabulary, binary weights, etc.

Please check the official documentation:

https://spacy.io/api/language

Upvotes: 2

Related Questions