Reputation: 57
I am using Python Spacy to replace any entity with the label_ == "PERSON" with "[XXX]". It seems like I have done that correctly, but I am struggling with replacing it in my Teststring:
import spacy
from spacy.matcher import Matcher
nlp = spacy.load("en_core_web_sm")
file_text = """This is my teststring. Isaac Newton is supposed to be changed."""
nlp.add_pipe("merge_entities")
def change_names(file_text):
text_doc = nlp(file_text)
mylist = []
for ent in text_doc.ents:
if ent.label_ == "PERSON":
print(ent)
mylist.append("[XXX]")
else:
mylist.append(ent.text)
res = ''.join(mylist)
print(res)
print(text_doc)
change_names(file_text)
This results in:
Isaac Newton [XXX] This is my teststring. Isaac Newton is supposed to be changed.
Result should be: This is my teststring. [XXX] is supposed to be changed
Now I want to iterate over my text_doc and replace any ent with label_ == "PERSON" to "[XXX]". This is not working out for me. I tried using a double forloop for iterating over the string and if an item is an entity, jump into the for loop I posted here. Any suggestions?
Upvotes: 3
Views: 1944
Reputation: 627292
Since all you need is a string output, you can use
result = []
for t in text_doc:
if t.ent_type_ == "PERSON":
result.append("[XXX]")
else:
result.append(t.text)
result.append(t.whitespace_)
res = ''.join(result)
print(res)
That is:
PERSON
entity is found, append [XXX]
to the result
listThen, in the end, join the result
items.
Upvotes: 3