Reputation: 7620
When I use the re.search()
function to find matches in a block of text, the program exits once it finds the first match in the block of text.
How do I do this repeatedly where the program doesn't stop until ALL matches have been found? Is there a separate function to do this?
Upvotes: 553
Views: 584403
Reputation: 23421
Another method (a bit in keeping with OP's initial spirit albeit 13 years later) is to compile the pattern and call search()
on the compiled pattern and move along the pattern. This is a bit verbose but if you don't want a lookahead etc. or you want to search over a string more explicitly, then you can use the following function.
import re
def find_all_matches(pattern, string, group=0):
pat = re.compile(pattern)
pos = 0
out = []
while m := pat.search(string, pos):
pos = m.start() + 1
out.append(m[group])
return out
pat = r'all (.*?) are'
s = 'all cats are smarter than dogs, all dogs are dumber than cats'
find_all_matches(pat, s) # ['all cats are', 'all dogs are']
find_all_matches(pat, s, group=1) # ['cats', 'dogs']
This works for overlapping matches too:
find_all_matches(r'(\w\w)', "hello") # ['he', 'el', 'll', 'lo']
Upvotes: 11
Reputation: 8884
If you are interested in getting all matches (including overlapping matches, unlike @Amber's answer), there is a new library called REmatch which is specifically designed to produce all the matches of a regex on a text, including all overlapping matches. The tool supports a more general language of regular expressions with captures, called REQL.
For instance, the regexp !x{...}
will give all triples of three contiguous characters (including overlapping triples).
The approach should be more efficient that @cottontail's answer (which is general quadratic in the input string).
You can try REmatch out online here and get the Python code here.
Disclaimer: I know the authors of the tool. :)
Upvotes: 0
Reputation: 527368
Use re.findall
or re.finditer
instead.
re.findall(pattern, string)
returns a list of matching strings.
re.finditer(pattern, string)
returns an iterator over MatchObject
objects.
Example:
re.findall( r'all (.*?) are', 'all cats are smarter than dogs, all dogs are dumber than cats')
# Output: ['cats', 'dogs']
[x.group() for x in re.finditer( r'all (.*?) are', 'all cats are smarter than dogs, all dogs are dumber than cats')]
# Output: ['all cats are', 'all dogs are']
Upvotes: 905