Python Regular Expressions - Limit Results?

Question

I feel kind of stupid asking this but I have made a few regular expressions to find specific businesses, addresses, and URLs in an HTML document. The problem is...I don't know which (python) regular expression thing I should use. When I use re.findall, I get 30 to 90 results. I want to limit it to 3 or maybe 5 (one set number). Which regex operation should I use to do this, or is there a parameter that can stop the search when it has reached a certain number of results?

Also, is there a faster way of searching an HTML document so that my program isn't slowed down with regular expressions searching this really long "string" of text?

Thanks.

EDIT

I have Beautiful Soup and I've used it to just make things easier to read...but not to parse.

I've also used lxml...which is better/faster?

MRAB · Accepted Answer

Instead of using re.findall, use re.finditer. It returns an iterator which yields the next match on demand.

Here's an example:

>>> [m.group(0) for m, _ in zip(re.finditer(r"\w", "abcdef"), range(3))]
['a', 'b', 'c']

Python Regular Expressions - Limit Results?

Answers (1)

Related Questions