Htet
Htet

Reputation: 159

Find a word of a list in a sentence of another list and replace it in Python 2.7

I have existing two lists as below:

list_a = ['one','two','three','four','five','six','seven',...]

list_content = ['This is 1st sentence with one.',
'This is 2nd sentence with seven.',
'This is 3rd sentence with one and two.',
'This is 4th sentence with three, five, and six.',...]

The idea is to find a word from list_a in each sentence of list_content and replace them into '__' for exact match.

The output should be like this:

list_output = ['This is 1st sentence with ___.',
'This is 2nd sentence with ___.',
'This is 3rd sentence with ___ and ___.',
'This is 4th sentence with ___, ___, and ___.',...]

My attempt using re.sub:

for each_sent in list_content:
  for word in list_a:
     result = re.sub(r'\b' + word + r'\b', '__', each)
  print result

It doesn't seem to be replaced as in output.

Upvotes: 0

Views: 84

Answers (4)

Eric
Eric

Reputation: 5101

use python-textops package :

from textops import *
print list_content >> sed('|'.join(list_a),'__')

Upvotes: 2

AKS
AKS

Reputation: 19831

How about without any loops (https://regex101.com/r/pvwuUw/1):

In [4]: sep = "||||"

In [5]: re.sub(r'\b' + '|'.join(list_a) + r'\b', '__', sep.join(list_content)).split(sep)
Out[5]: 
['This is 1st sentence with __.',
 'This is 2nd sentence with __.',
 'This is 3rd sentence with __ and __.',
 'This is 4th sentence with __, __, and __.']

The idea is to join the list_content with a separator and after replacement split the string with the same separator again.

Upvotes: 2

Rami
Rami

Reputation: 81

Avoid loop inside a loop. I wrote this keeping performance in mind

re_str_a = re.compile( '\b' + '\b|\b'.join(list_a) + '\b')
for each in list_content:
   print re_str_a.sub('___', each)

Upvotes: 3

Alex Fung
Alex Fung

Reputation: 2006

this should work:

import re

list_a = ['one','two','three','four','five','six','seven',]

list_content = ['This is 1st sentence with one.',
'This is 2nd sentence with seven.',
'This is 3rd sentence with one and two.',
'This is 4th sentence with three, five, and six.',]
list_output = []
for each_sent in list_content:
    for word in list_a:
        each_sent = re.sub(r'\b' + word + r'\b', '__', each_sent)
    list_output.append(each_sent)
print list_output

Output:

['This is 1st sentence with __.', 'This is 2nd sentence with __.', 'This is 3rd sentence with __ and __.', 'This is 4th sentence with __, __, and __.']

Upvotes: 3

Related Questions