Reputation: 893

Python Regex: Matching a phrase regardless of intermediate spaces

Given a phrase in a given line, I need to be able to match that phrase even if the words have a different number of spaces in the line.

Thus, if the phrase is "the quick brown fox" and the line is "the quick brown fox jumped over the lazy dog", the instance of "the quick brown fox" should still be matched.

The method I already tried was to replace all instances of whitespace in the line with a regex pattern for whitespace, but this doesn't always work if the line contains characters that aren't treated as literal by regex.

Upvotes: 0

Answers (5)

Felix

Reputation: 1905

As your later clarified, you needed to match any line and series of words. To achieve this I added some more examples to clarify what the both proposed similar regexes do:

text = """the           quick      brown        fox
another line                    with single and multiple            spaces
some     other       instance     with        six                      words"""

Matching whole lines

The first one matches the whole line, iterating over the single lines

pattern1 = re.compile(r'((?:\w+)(?:\s+|$))+')
for i, line in enumerate(text.split('\n')):
    match = re.match(pattern1, line)
    print(i, match.group(0))

Its output is:

0 the           quick      brown        fox
1 another line                    with single and multiple            spaces
2 some     other       instance     with        six                      words

Matching whole lines

The second one matches single words and iterates of them one-by-one while iterating over the single lines:

pattern2 = re.compile(r'(\w+)(?:\s+|$)')
for i, line in enumerate(text.split('\n')):
    for m in re.finditer(pattern2, line):
        print(m.group(1))
    print()

Its output is:

the
quick
brown
fox

another
line
with
single
and
multiple
spaces

some
other
instance
with
six
words

Upvotes: 0

ggcarmi

Reputation: 468

for the general case:

replace each sequence of space characters in only one space character.

check if the given sentence is sub string of the line after the replacement

import re

pattern = "your pattern"

for line in lines:
    line_without_spaces= re.sub(r'\s+', ' ', line)  
    # will replace multiple spaces with one space
    return pattern in line_without_spaces

Upvotes: 0

blhsing

Reputation: 106523

You can split the given string by white spaces and join them back by a white space, so that you can then compare it to the phrase you're looking for:

s = "the           quick      brown        fox"
' '.join(s.split()) == "the quick brown fox" # returns True

Upvotes: 0

YusufUMS

Reputation: 1493

You can use this regex. Check here

(the\s+quick\s+brown\s+fox)

Upvotes: 0

DDGG

Reputation: 1241

This should work:

import re

pattern = r'the\s+quick\s+brown\s+fox'
text = 'the           quick      brown        fox jumped over the lazy dog'

match = re.match(pattern, text)
print(match.group(0))

The output is:

the           quick      brown        fox

Upvotes: 1

Python Regex: Matching a phrase regardless of intermediate spaces

Answers (5)

Matching whole lines

Matching whole lines

Related Questions