user10083444
user10083444

Reputation: 105

How to remove ORG names and GPE from noun chunk in spacy

I have the following code

import spacy
from spacy.tokens import Span
import en_core_web_lg
nlpsm = en_core_web_lg.load()

doc = nlpsm(text)

finalwor = []
    fil = [i for i in doc.ents if i.label_.lower() in ["person"]]
    fil_a = [i for i in doc.ents if i.label_.lower() in ['GPE']]
    fil_b = [i for i in doc.ents if i.label_.lower() in ['ORG']]
    for chunk in doc.noun_chunks:
        if chunk not in fil and chunk not in fil_a and chunk not in fil_b:
            finalwor=list(doc.noun_chunks)
            print("finalwor after noun_chunk", finalwor)
        else: 
            chunk in fil_a and chunk in fil_b
            entword=list(str(chunk.text).replace(str(chunk.text),""))
            finalwor.extend(entword)

I am not sure what I am doing wrong here. If the text is 'IT manager at Google'

My current output is "IT manager, Google'

Ideal output that I want is "IT manager".

Basically I want the company names and GPE names to replaced by empty string or just plainly just delete it.

Upvotes: 1

Views: 356

Answers (1)

dee
dee

Reputation: 26

I think here, finalwor=list(doc.noun_chunks), you are appending all the nouns that appear in your doc to the final word instead of just the noun that justifies your statement

You might be looking for something like this:

import spacy
from spacy.tokens import Span
import en_core_web_lg
nlpsm = en_core_web_lg.load()

doc = nlpsm('Maria, IT manager at Google and gardener')

finalwor = []
fil = [i for i in doc.ents if i.label_.lower() in ["person"]]
fil_a = [i for i in doc.ents if i.label_.lower() in ['gpe']]
fil_b = [i for i in doc.ents if i.label_.lower() in ['org']]

for chunk in doc.noun_chunks:
    if chunk not in fil and chunk not in fil_a and chunk not in fil_b:
        finalwor.append(chunk)

print("finalwor after noun_chunk", finalwor)

finalwor after noun_chunk [IT manager, gardener]

Upvotes: 1

Related Questions