Emiliano Viotti
Emiliano Viotti

Reputation: 1709

How to get a description for each Spacy NER entity?

I am using Spacy NER model to extract from a text, some named entities relevant to my problem, such us DATE, TIME, GPE among others.

For example, I need to recognize the Time Zone in the following sentence:

"Australian Central Time"

With Spacy model en_core_web_lg, I got the following result:

doc = nlp("Australian Central Time")
print([(ent.label_, ent.text) for ent in doc.ents])
    
>> [('NORP', 'Australian')]

My problem is: I don't have a clear idea about what exactly means entity NORP and more general what exactly means each Spacy NER entity (leaving aside the intuitive values of course).

I found the following snippet to get the complete entities list, but after that I'm blocked:

import spacy
nlp = spacy.load("en_core_web_lg")
nlp.get_pipe("ner").labels

I'm pretty new to using Spacy NLP and didn't find what I'm looking for on the official documentation, so any help will be appreciated!

BTW, I'm using Spacy version 3.2.1.

Upvotes: 7

Views: 8511

Answers (3)

user1420372
user1420372

Reputation: 2187

This will give each label and description:

nlp = spacy.load("en_core_web_trf", disable=["tagger", "parser", "attribute_ruler", "lemmatizer"])
for label in nlp.get_pipe('ner').labels:
    print(f"{label}: {spacy.explain(label)}")

returns:

CARDINAL: Numerals that do not fall under another type
DATE: Absolute or relative dates or periods
EVENT: Named hurricanes, battles, wars, sports events, etc.
FAC: Buildings, airports, highways, bridges, etc.
GPE: Countries, cities, states
LANGUAGE: Any named language
LAW: Named documents made into laws.
LOC: Non-GPE locations, mountain ranges, bodies of water
MONEY: Monetary values, including unit
NORP: Nationalities or religious or political groups
ORDINAL: "first", "second", etc.
ORG: Companies, agencies, institutions, etc.
PERCENT: Percentage, including "%"
PERSON: People, including fictional
PRODUCT: Objects, vehicles, foods, etc. (not services)
QUANTITY: Measurements, as of weight or distance
TIME: Times smaller than a day
WORK_OF_ART: Titles of books, songs, etc.

Upvotes: 2

Huyen
Huyen

Reputation: 533

The whole list is as below. As of February 2023, there are 18 labels in the English model.

PERSON:      People, including fictional.
NORP:        Nationalities or religious or political groups.
FAC:         Buildings, airports, highways, bridges, etc.
ORG:         Companies, agencies, institutions, etc.
GPE:         Countries, cities, states.
LOC:         Non-GPE locations, mountain ranges, bodies of water.
PRODUCT:     Objects, vehicles, foods, etc. (Not services.)
EVENT:       Named hurricanes, battles, wars, sports events, etc.
WORK_OF_ART: Titles of books, songs, etc.
LAW:         Named documents made into laws.
LANGUAGE:    Any named language.
DATE:        Absolute or relative dates or periods.
TIME:        Times smaller than a day.
PERCENT:     Percentage, including ”%“.
MONEY:       Monetary values, including unit.
QUANTITY:    Measurements, as of weight or distance.
ORDINAL:     “first”, “second”, etc.
CARDINAL:    Numerals that do not fall under another type.

Source: Mikael Davidsson on Medium.

Upvotes: 7

aab
aab

Reputation: 11474

Most labels have definitions you can access using spacy.explain(label).

For NORP: "Nationalities or religious or political groups"

For more details you would need to look into the annotation guidelines for the resources listed in the model documentation under https://spacy.io/models/.

Upvotes: 7

Related Questions