Reputation: 11943
I've searched by myself, but can't manage to make a successful regular expression.
I have an html file containing variables between []
and I'd like to get every words inside these.
<div id='client_info'>
<p><b>[client_name]</b><br/><b>[client_company]</b></p>
<p>[client_address]<br/>[client_CP]<br/>[client_city]</p>
</div>
There it should give me an array containing "client_name", "client_company", "client_address", ...
I've did this :
vars = re.search('\[(.*)\]', html_template)
groups = vars.groups()
print groups
But it outputs ('client_name]</b><br/><b>[client_company',)
I've tried playing with ^
and $
but unsuccessfully.
Thank you for your help.
Upvotes: 2
Views: 40
Reputation: 39365
Python has a very powerful library named BeautifulSoup. I'd recommend you to use this for parsing html. So I'd recommend you to parse the div
using this library first. And then perform regular expression.
html = '''
...some htmls...
<div id='client_info'>
<p><b>[client_name]</b><br/><b>[client_company]</b></p>
<p>[client_address]<br/>[client_CP]<br/>[client_city]</p>
</div>
...more htmls...
'''
soup = BeautifulSoup(html)
div = soup.find("div", {"id":"client_info"})
p = div.findAll("p")
for tag in p:
print re.findall('\[([^\]]*)\]', tag.renderContents())
May be there is a way to parse the <br/>
using BeautifulSoup, but I don't know it.
Upvotes: 1
Reputation: 149020
Either use a non-greedy quantifier, like this:
re.search('\[(.*?)\]', html_template)
Or a character class, like this:
re.search('\[([^\]]*)\]', html_template)
And use re.findall
to get all matching sub strings.
Upvotes: 4