Reputation: 31
Good day. Little problem with regexp.
I have a regexp that look like
rexp2 = re.findall(r'<p>(.*?)</p>', data)
And i need to grab all in
<div id="header">
<h1></h1>
<p>
localhost OpenWrt Backfire<br />
Load: 0.00 0.00 0.00<br />
Hostname: localhost
</p>
</div>
But my code doesnt work :( What im doing wrong?
Upvotes: 1
Views: 1430
Reputation: 74705
Statutory Warning: It is a Bad Idea to parse (X)HTML using regular expression.
Fortunately there is a better way. To get going, first install the BeautifulSoup
module. Next, read up on the documentation. Third, code!
Here is one way to do what you are trying to do:
from BeautifulSoup import BeautifulSoup
html = """<div id="header">
<h1></h1>
<p>
localhost OpenWrt Backfire<br />
Load: 0.00 0.00 0.00<br />
Hostname: localhost
</p>
</div>"""
soup = BeautifulSoup(html)
for each in soup.findAll(name = 'p'):
print each
Upvotes: 4
Reputation: 10260
You need to specify re.M
(multiline) flag to match multiline strings. But parsing HTML with regexps isn't a particularly good idea.
It looks like you want some stats from an OpenWrt-powered router. Why don't you write simple CGI script that outputs required information in machine-readable format?
Upvotes: 0
Reputation: 66540
dot is not mathching enter, use re.DOTALL
re.findall(r'<p>(.*?)</p>', data, re.DOTALL)
Upvotes: 0
Reputation: 308783
I wouldn't recommend using regular expressions this way. Try parsing HTML with Beautiful Soup instead and walk the DOM tree.
Upvotes: 1