Reputation: 21
import spacy
import en_core_web_sm
import re
nlp = en_core_web_sm.load()
document_string= 'Electronically signed by : John Douglas.; Jun 13 2018 11:13AM CST, Adam Smith.'
nlp_doc = nlp(document_string)
from spacy.matcher import Matcher
matcher = Matcher(nlp.vocab)
pattern = [{'POS': 'PROPN'}, {'POS': 'PROPN'}]
matcher.add('FULL_NAME', None, pattern)
matches = matcher(nlp_doc)
for match_id, start, end in matches:
span = nlp_doc[start:end]
names = span.text
print(span.text)
Output:
John Douglas
Adam Smith
I need to replace it to [hidden] and print document_string with show hidden values (previous output)
Required output:
Electronically signed by : [hidden].; Jun 13 2018 11:13AM CST, [hidden].
Upvotes: 0
Views: 1155
Reputation: 169354
You can simply use .replace()
here.
new_doc = nlp_doc.text
names = []
pattern = [{'POS': 'PROPN'}, {'POS': 'PROPN'}]
matcher.add('FULL_NAME', None, pattern)
matches = matcher(nlp_doc)
for match_id, start, end in matches:
span = nlp_doc[start:end]
names.append(span.text)
for name in names:
new_doc = new_doc.replace(name,'[hidden]')
Result:
In [114]: new_doc
Out[114]: 'Electronically signed by : [hidden].; Jun 13 2018 11:13AM CST, [hidden].'
Upvotes: 2