Reputation: 893
Given a phrase in a given line, I need to be able to match that phrase even if the words have a different number of spaces in the line.
Thus, if the phrase is "the quick brown fox"
and the line is "the quick brown fox jumped over the lazy dog"
, the instance of "the quick brown fox"
should still be matched.
The method I already tried was to replace all instances of whitespace in the line with a regex pattern for whitespace, but this doesn't always work if the line contains characters that aren't treated as literal by regex.
Upvotes: 0
Views: 409
Reputation: 1905
As your later clarified, you needed to match any line and series of words. To achieve this I added some more examples to clarify what the both proposed similar regexes do:
text = """the quick brown fox
another line with single and multiple spaces
some other instance with six words"""
The first one matches the whole line, iterating over the single lines
pattern1 = re.compile(r'((?:\w+)(?:\s+|$))+')
for i, line in enumerate(text.split('\n')):
match = re.match(pattern1, line)
print(i, match.group(0))
Its output is:
0 the quick brown fox
1 another line with single and multiple spaces
2 some other instance with six words
The second one matches single words and iterates of them one-by-one while iterating over the single lines:
pattern2 = re.compile(r'(\w+)(?:\s+|$)')
for i, line in enumerate(text.split('\n')):
for m in re.finditer(pattern2, line):
print(m.group(1))
print()
Its output is:
the
quick
brown
fox
another
line
with
single
and
multiple
spaces
some
other
instance
with
six
words
Upvotes: 0
Reputation: 468
for the general case:
check if the given sentence is sub string of the line after the replacement
import re
pattern = "your pattern"
for line in lines:
line_without_spaces= re.sub(r'\s+', ' ', line)
# will replace multiple spaces with one space
return pattern in line_without_spaces
Upvotes: 0
Reputation: 106523
You can split the given string by white spaces and join them back by a white space, so that you can then compare it to the phrase you're looking for:
s = "the quick brown fox"
' '.join(s.split()) == "the quick brown fox" # returns True
Upvotes: 0
Reputation: 1241
This should work:
import re
pattern = r'the\s+quick\s+brown\s+fox'
text = 'the quick brown fox jumped over the lazy dog'
match = re.match(pattern, text)
print(match.group(0))
The output is:
the quick brown fox
Upvotes: 1