Loïc
Loïc

Reputation: 11943

Python - re - need help for regular expression

I've searched by myself, but can't manage to make a successful regular expression.

I have an html file containing variables between [] and I'd like to get every words inside these.

<div id='client_info'>
    <p><b>[client_name]</b><br/><b>[client_company]</b></p>
    <p>[client_address]<br/>[client_CP]<br/>[client_city]</p>
</div>

There it should give me an array containing "client_name", "client_company", "client_address", ...

I've did this :

vars = re.search('\[(.*)\]', html_template)
groups = vars.groups()
print groups

But it outputs ('client_name]</b><br/><b>[client_company',)

I've tried playing with ^ and $ but unsuccessfully.

Thank you for your help.

Upvotes: 2

Views: 40

Answers (2)

Sabuj Hassan
Sabuj Hassan

Reputation: 39365

Python has a very powerful library named BeautifulSoup. I'd recommend you to use this for parsing html. So I'd recommend you to parse the div using this library first. And then perform regular expression.

html = '''
...some htmls...
<div id='client_info'>
    <p><b>[client_name]</b><br/><b>[client_company]</b></p>
    <p>[client_address]<br/>[client_CP]<br/>[client_city]</p>
</div>
...more htmls...
'''
soup = BeautifulSoup(html)
div = soup.find("div", {"id":"client_info"})
p = div.findAll("p")
for tag in p:
    print re.findall('\[([^\]]*)\]', tag.renderContents())

May be there is a way to parse the <br/> using BeautifulSoup, but I don't know it.

Upvotes: 1

p.s.w.g
p.s.w.g

Reputation: 149020

Either use a non-greedy quantifier, like this:

re.search('\[(.*?)\]', html_template)

Or a character class, like this:

re.search('\[([^\]]*)\]', html_template)

And use re.findall to get all matching sub strings.

Upvotes: 4

Related Questions