Reputation: 619
I've been using python for web scraping. Everything worked like a oiled gear until I used it to get the description of a product which is actually a laaaarge description.
So, it's not working at all... like if my regex was incorrect. Sadly I can not tell you which website I'm scraping in order to show you the real example, but I actually know that the regex is actually ok... it's something like this:
descriptionRegex = 'id="this_id">(.*)</div>\s*<div\ id="another_id"'
for found in re.findall(descriptionRegex, response) :
print found
The deal is that (.*) is like 25000+ characters
There's a limit of characters to reach on a re.findall() finding? There's any way I can achieve this?
Upvotes: 0
Views: 73
Reputation: 168626
You need to specify re.DOTALL
in your call to .findall()
.
If you run this program, it will behave as you request:
import re
response = '''id="this_id">
blah
</div> <div id="another_id"'''
descriptionRegex = r'id="this_id">(.*)</div>\s*<div\ id="another_id"'
for found in re.findall(descriptionRegex, response, re.DOTALL ) :
print found
Upvotes: 3