Souvik Ray
Souvik Ray

Reputation: 3018

How to look for a substring in a string with certain rules using regex?

I have a string like below

Hello there how are you?

I want to look for the substring 'there how' in the string. So I would do something like this

import re
string = "Hello there how are you?"
term = "there how"
print(re.search("\s" + term + "\s",  string).group(0)). # /s is used to ensure the match should be an independent phrase

But now the problem is, if I get a variation of the string, then the match doesn't occur. For example for strings like this

If there is a large amount of space between the words

Hello there         how are you?

If certain letters are capitialized

Hello There How are you?

What I want to do is to ensure as long as the substring 'there how' is present in the string as a separate phrase (not like Hellothere how are you? or Hello there howare you? etc), I should be able to find a match.

How can I achieve the objective?

Upvotes: 2

Views: 151

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626804

You may replace spaces with \s+ in the term and use a case insensitive matching by passing re.I flag:

import re
ss = ["Hello there how are you?", "Hello there         how are you?", "Hello There How are you?"]
term = "there how"
rx = re.compile(r"(?<!\S){}(?!\S)".format(term.replace(r" ", r"\s+")), re.I)

for s in ss:
    m = re.search(rx,  s)
    if m:
        print(m.group())

Output:

there how
there         how
There How

See the Python demo

NOTE: If the term can contain special regex metacharacters, you need to re.escape the term, but do it before replacing spaces with \s+. Since spaces are escaped with re.escape, you need to .replace(r'\ ', r'\s+'):

rx = re.compile(r"(?<!\S){}(?!\S)".format(re.escape(term).replace(r"\ ", r"\s+")), re.I)

JavaScript solution:

var ss = ["Hello there how are you?", "Hello there         how are you?", "Hello There How are you?"];
var term = "there how";
var rx = new RegExp("(?<!\\S)" + term.replace(/ /g, "\\s+") + "(?!\\S)", "i");
for (var i=0; i<ss.length; i++) {
    var m = ss[i].match(rx) || "";
    console.log(m[0]);
}

Upvotes: 2

Related Questions