user8871463
user8871463

Reputation:

Identify certain word phrases in a string in python

I have a list of word phrases and a string as follows.

mylist = ['and rock, 'shake well', 'the']
mystring = "the sand rock need to be mixed and shake well"

I want to replace the words in mylist with "".

I am currently using replace method in python as follows.

for item in mylist:
        mystring = mystring.replace(item, "")

Howver, I noted that it does not work well for all my sentences. For example in mystring it has a fake match with sand rock and output as follows.

  s  need to be mixed and

Howver, I want it to be as;

sand rock need to be mixed and

Is there a better way of doing this in python?

Upvotes: 1

Views: 404

Answers (3)

Transhuman
Transhuman

Reputation: 3547

Using re.sub and applying \b (word boundary) to match exact string

import re    
re.sub('\b'+'|'.join(mylist), '', mystring)
#' sand rock need to be mixed and '

Upvotes: 0

Craig
Craig

Reputation: 4855

The problem is that str.replace() doesn't allow you to specify that you only want to match whole words (or phrases). The re module allows you to use regular expressions (regex) for pattern matching. With regex, you can specify word boundaries using the \b escape. Place the \b escape before and after your phrases to cause the match to only occur at word boundaries. The re.sub() function works like the str.replace() method and you can use it in your code like:

import re
mylist = ['and rock', 'shake well', 'the']
mystring = "the sand rock need to be mixed and shake well"
for item in mylist:
        mystring = re.sub(r"\b{}\b".format(item), "", mystring)        
print(mystring)

Out[6]: ' sand rock need to be mixed and '

Upvotes: 3

GaryMBloom
GaryMBloom

Reputation: 5682

Part of the trick of your problem is that you don't want to match partial words. That's why the replace() method does not do what you want it to do. You can achieve what you want through regular expressions. One of the nice thing about REs is that you can match on word boundaries using the \b flag.

Upvotes: 2

Related Questions