eboraks
eboraks

Reputation: 25

Spacy.io Wikipedia Entity Linker - Results NLP Model Have no KB Entities

I have been learning how to use the Sapcy.io Entity Linker using the Wikipedia example here.

I started with a small training size of 2000 articles (it ran for 20 hours) but the results model does not recognize or return any kb entities even from text that used in the training.

nlp_kb.from_disk("/path/to/nel-wikipedia/output_lt_kb80k_model_vsm/nlp") 

text = "Anarchism is a political philosophy and movement that rejects all involuntary, coercive forms of hierarchy. It calls for the abolition of the state which it holds to be undesirable, unnecessary and harmful. It is usually described alongside libertarian Marxism as the libertarian wing (libertarian socialism) of the socialist movement and as having a historical association with anti-capitalism and socialism. The history of anarchism goes back to prehistory, when some humans lived in anarchistic societies long before the establishment of formal states, realms or empires. With the rise of organised hierarchical bodies, skepticism toward authority also rose, but it was not until the 19th century that a self-conscious political movement emerged. During the latter half of the 19th and the first decades of the 20th century, the anarchist movement flourished in most parts of the world and had a significant role in workers' struggles for emancipation. Various anarchist schools of thought formed during this period. Anarchists have taken part in several revolutions, most notably in the Spanish Civil War, whose end marked the end of the classical era of anarchism. In the last decades of the 20th century and into the 21st century, the anarchist movement has been resurgent once more. Anarchism employs various tactics in order to meet its ideal ends; these can be broadly separated into revolutionary and evolutionary tactics."


doc = nlp_kb(text)
for ent in doc.ents:
    print(ent.text, ent.label_, ent.kb_id_)

Results

the 19th century DATE 
the latter half of the 19th and the first decades of the 20th century DATE 
Anarchists NORP 
the Spanish Civil War EVENT 
the last decades of the 20th century DATE 
the 21st century DATE

The NLP model doesn't have an entity linker pipeline.

nlp_kb.meta["pipeline"]
['tagger', 'parser', 'ner']

But the meta.json has it.

{
  "lang":"en",
  "name":"core_web_lg",
  "license":"MIT",
  "author":"Explosion",
  "url":"https://explosion.ai",
  "email":"[email protected]",
  "description":"English multi-task CNN trained on OntoNotes, with GloVe vectors trained on Common Crawl. Assigns word vectors, POS tags, dependency parses and named entities.",
  "sources":[
    {
      "name":"OntoNotes 5",
      "url":"https://catalog.ldc.upenn.edu/LDC2013T19",
      "license":"commercial (licensed by Explosion)"
    },
    {
      "name":"GloVe Common Crawl",
      "author":"Jeffrey Pennington, Richard Socher, and Christopher D. Manning",
      "url":"https://nlp.stanford.edu/projects/glove/",
      "license":"Public Domain Dedication and License v1.0"
    }
  ],
  "pipeline":[
    "tagger",
    "parser",
    "ner",
    "entity_linker"
  ],

Here is the constant of the NLP directory

(spacy) ➜  nlp git:(master) ✗ ls
entity_linker meta.json     ner           parser        tagger        tokenizer     vocab

(spacy) ➜  nlp git:(master) ✗ ls -l entity_linker
total 55040
-rw-r--r--  1 staff       323 Sep  8 04:40 cfg
-rw-r--r--  1 staff  25294844 Sep  8 04:40 kb
-rw-r--r--  1 staff   2875799 Sep  8 04:40 model

I am assuming I am loading the model wrong, but I am not sure how to fix it.

Upvotes: 1

Views: 509

Answers (1)

Sofie VL
Sofie VL

Reputation: 3106

You've used this line:

nlp_kb.from_disk("/path/to/nel-wikipedia/output_lt_kb80k_model_vsm/nlp") 

which basically loads trained weights for the existing nlp_kb from disk. However, it doesn't actually change any internals of this nlp_kb object - it also won't automagically add new components.

Instead, what you want to do is

nlp_el = spacy.load("/path/to/nel-wikipedia/output_lt_kb80k_model_vsm/nlp")

and then you should have a new NLP object with the entity_linker component.

Upvotes: 2

Related Questions