Ahikoteru
Ahikoteru

Reputation: 57

Python Spacy replace value of ent.label_ == PERSON with something else

I am using Python Spacy to replace any entity with the label_ == "PERSON" with "[XXX]". It seems like I have done that correctly, but I am struggling with replacing it in my Teststring:

import spacy
from spacy.matcher import Matcher

nlp = spacy.load("en_core_web_sm")

file_text = """This is my teststring. Isaac Newton is supposed to be changed."""

nlp.add_pipe("merge_entities")

def change_names(file_text):
    text_doc = nlp(file_text)
    mylist = []
    for ent in text_doc.ents:
        if ent.label_ == "PERSON":
            print(ent)
            mylist.append("[XXX]")
        else:
            mylist.append(ent.text)
    res = ''.join(mylist)
    print(res)
    print(text_doc)

change_names(file_text)

This results in:

Isaac Newton [XXX] This is my teststring. Isaac Newton is supposed to be changed.

Result should be: This is my teststring. [XXX] is supposed to be changed

Now I want to iterate over my text_doc and replace any ent with label_ == "PERSON" to "[XXX]". This is not working out for me. I tried using a double forloop for iterating over the string and if an item is an entity, jump into the for loop I posted here. Any suggestions?

Upvotes: 3

Views: 1944

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627292

Since all you need is a string output, you can use

result = []
for t in text_doc:
    if t.ent_type_ == "PERSON":
        result.append("[XXX]")
    else:
        result.append(t.text)
    result.append(t.whitespace_)

res = ''.join(result)
print(res)

That is:

  • Once the PERSON entity is found, append [XXX] to the result list
  • Else, add the current token text
  • Append any whitespace after the token if present.

Then, in the end, join the result items.

Upvotes: 3

Related Questions