Reputation: 2615
I feel kind of stupid asking this but I have made a few regular expressions to find specific businesses, addresses, and URLs in an HTML document. The problem is...I don't know which (python) regular expression thing I should use. When I use re.findall, I get 30 to 90 results. I want to limit it to 3 or maybe 5 (one set number). Which regex operation should I use to do this, or is there a parameter that can stop the search when it has reached a certain number of results?
Also, is there a faster way of searching an HTML document so that my program isn't slowed down with regular expressions searching this really long "string" of text?
Thanks.
EDIT
I have Beautiful Soup and I've used it to just make things easier to read...but not to parse.
I've also used lxml...which is better/faster?
Upvotes: 2
Views: 1683
Reputation: 20654
Instead of using re.findall
, use re.finditer
. It returns an iterator which yields the next match on demand.
Here's an example:
>>> [m.group(0) for m, _ in zip(re.finditer(r"\w", "abcdef"), range(3))]
['a', 'b', 'c']
Upvotes: 1