Reputation: 159
I have existing two lists as below:
list_a = ['one','two','three','four','five','six','seven',...]
list_content = ['This is 1st sentence with one.',
'This is 2nd sentence with seven.',
'This is 3rd sentence with one and two.',
'This is 4th sentence with three, five, and six.',...]
The idea is to find a word from list_a in each sentence of list_content and replace them into '__' for exact match.
The output should be like this:
list_output = ['This is 1st sentence with ___.',
'This is 2nd sentence with ___.',
'This is 3rd sentence with ___ and ___.',
'This is 4th sentence with ___, ___, and ___.',...]
My attempt using re.sub:
for each_sent in list_content:
for word in list_a:
result = re.sub(r'\b' + word + r'\b', '__', each)
print result
It doesn't seem to be replaced as in output.
Upvotes: 0
Views: 84
Reputation: 5101
use python-textops package :
from textops import *
print list_content >> sed('|'.join(list_a),'__')
Upvotes: 2
Reputation: 19831
How about without any loops (https://regex101.com/r/pvwuUw/1):
In [4]: sep = "||||"
In [5]: re.sub(r'\b' + '|'.join(list_a) + r'\b', '__', sep.join(list_content)).split(sep)
Out[5]:
['This is 1st sentence with __.',
'This is 2nd sentence with __.',
'This is 3rd sentence with __ and __.',
'This is 4th sentence with __, __, and __.']
The idea is to join the list_content
with a separator and after replacement split the string with the same separator again.
Upvotes: 2
Reputation: 81
Avoid loop inside a loop. I wrote this keeping performance in mind
re_str_a = re.compile( '\b' + '\b|\b'.join(list_a) + '\b')
for each in list_content:
print re_str_a.sub('___', each)
Upvotes: 3
Reputation: 2006
this should work:
import re
list_a = ['one','two','three','four','five','six','seven',]
list_content = ['This is 1st sentence with one.',
'This is 2nd sentence with seven.',
'This is 3rd sentence with one and two.',
'This is 4th sentence with three, five, and six.',]
list_output = []
for each_sent in list_content:
for word in list_a:
each_sent = re.sub(r'\b' + word + r'\b', '__', each_sent)
list_output.append(each_sent)
print list_output
Output:
['This is 1st sentence with __.', 'This is 2nd sentence with __.', 'This is 3rd sentence with __ and __.', 'This is 4th sentence with __, __, and __.']
Upvotes: 3