Piotr Lopusiewicz
Piotr Lopusiewicz

Reputation: 2594

Finding first N occurrences of regex in Python

So this should be easy but I somehow miss the answer on SO or Python docs. I am using this code:

myregex.findall(source)

This produces all matches of myregex as a list. Now, the problem is that source is long and I only need first 6 occurrences of substring matching myregex. I imagine that it would be much faster if matching process could stop after finding first n occurrences. How do I do something like:

myregex.findall(source, n)

?

Upvotes: 0

Views: 1174

Answers (2)

pradyunsg
pradyunsg

Reputation: 19406

Since you want performance, use regex.finditer

def my_find(regex, s, n):
    const = regex.finditer(s)
    return [const.next().groups() for i in range(n)]

Or a safer version:

def my_find(regex, s, n):
    const = regex.finditer(s)
    ret_val = []
    for i in range(n):
        try:
            ret_val.append(const.next().groups())
        except StopIteration:
            return ret_val
    return ret_val

Upvotes: 1

nneonneo
nneonneo

Reputation: 179402

Use re.finditer:

import itertools
for m in itertools.islice(re.finditer(pat, text), 6):
    ...

re.finditer is a generator that produces match objects on demand. You can get the complete match from m.group(0), or individual pattern matches from m.group(1) and up.

Upvotes: 8

Related Questions