Reputation: 105
I have the following code
import spacy
from spacy.tokens import Span
import en_core_web_lg
nlpsm = en_core_web_lg.load()
doc = nlpsm(text)
finalwor = []
fil = [i for i in doc.ents if i.label_.lower() in ["person"]]
fil_a = [i for i in doc.ents if i.label_.lower() in ['GPE']]
fil_b = [i for i in doc.ents if i.label_.lower() in ['ORG']]
for chunk in doc.noun_chunks:
if chunk not in fil and chunk not in fil_a and chunk not in fil_b:
finalwor=list(doc.noun_chunks)
print("finalwor after noun_chunk", finalwor)
else:
chunk in fil_a and chunk in fil_b
entword=list(str(chunk.text).replace(str(chunk.text),""))
finalwor.extend(entword)
I am not sure what I am doing wrong here. If the text is 'IT manager at Google'
My current output is "IT manager, Google'
Ideal output that I want is "IT manager".
Basically I want the company names and GPE names to replaced by empty string or just plainly just delete it.
Upvotes: 1
Views: 356
Reputation: 26
I think here, finalwor=list(doc.noun_chunks)
, you are appending all the nouns that appear in your doc
to the final word instead of just the noun that justifies your statement
You might be looking for something like this:
import spacy
from spacy.tokens import Span
import en_core_web_lg
nlpsm = en_core_web_lg.load()
doc = nlpsm('Maria, IT manager at Google and gardener')
finalwor = []
fil = [i for i in doc.ents if i.label_.lower() in ["person"]]
fil_a = [i for i in doc.ents if i.label_.lower() in ['gpe']]
fil_b = [i for i in doc.ents if i.label_.lower() in ['org']]
for chunk in doc.noun_chunks:
if chunk not in fil and chunk not in fil_a and chunk not in fil_b:
finalwor.append(chunk)
print("finalwor after noun_chunk", finalwor)
finalwor after noun_chunk [IT manager, gardener]
Upvotes: 1