Felix
Felix

Reputation: 619

Maximum capacity on re module? Python

I've been using python for web scraping. Everything worked like a oiled gear until I used it to get the description of a product which is actually a laaaarge description.

So, it's not working at all... like if my regex was incorrect. Sadly I can not tell you which website I'm scraping in order to show you the real example, but I actually know that the regex is actually ok... it's something like this:

descriptionRegex = 'id="this_id">(.*)</div>\s*<div\ id="another_id"'

for found in re.findall(descriptionRegex, response) :
   print found

The deal is that (.*) is like 25000+ characters

There's a limit of characters to reach on a re.findall() finding? There's any way I can achieve this?

Upvotes: 0

Views: 73

Answers (1)

Robᵩ
Robᵩ

Reputation: 168626

You need to specify re.DOTALL in your call to .findall().

If you run this program, it will behave as you request:

import re
response = '''id="this_id">
blah
</div> <div id="another_id"'''

descriptionRegex = r'id="this_id">(.*)</div>\s*<div\ id="another_id"'

for found in re.findall(descriptionRegex, response, re.DOTALL ) :
   print found

Upvotes: 3

Related Questions