Reputation: 245
I am new to python and I have been trying to change my php regex into python but I have run into some problems with this multiline thing. I have been up and down the internet for the past couple days and I can't seem to make sense of it, if someone could help that would be great. Here is the regex I have made:
mlsTagRegex = re.compile("<td\swidth=\"13%\"\sclass=\"TopHeader\">(.*?)</td>", re.MULTILINE)
tdTags = mlsTagRegex.findall(output.getvalue())
print tdTags
Here is the HTML I would like it to find:
<td width="13%" class="TopHeader">
<span class="red">I WANT THIS PART</span>
</td>
and it just gives me an empty array. I'm pretty sure what I am missing is probably fairly simple but like I said I am new to python so if anyone could help? Thanks!
p.s.: the output in findall is what pycurl is outputting and that part of the html is in there.
Upvotes: 3
Views: 335
Reputation: 53879
You need to use re.DOTALL
to make .
match newline characters:
mlsTagRegex = re.compile(r'<td width="13%" class="TopHeader">(.*?)</td>', re.DOTALL)
But really you should avoid using regex for parsing html, use BeautifulSoup or lxml instead.
Upvotes: 2
Reputation: 280
Use re.DOTALL,so the '.' character will match any character,including the newline.
Upvotes: 1