Reputation: 788
This is my string :
content = '<tr class="cart-subtotal"><th>RTO / Registration office :</th><td><span class="amount"><h5>Yadgiri</h5></span></td></tr>'
I have tried below regular expression to extract the text which is in between h5 element tag:
reg = re.search(r'<tr class="cart-subtotal"><th>RTO / Registration office :</th><td><span class="amount"><h5>([A-Za-z0-9%s]+)</h5></span></td></tr>' % string.punctuation,content)
It's exactly returns what I wants.
Is there any more pythonic way to get this one ?
Upvotes: 2
Views: 389
Reputation: 174
Dunno whether this qualifies as more pythonic or not, but it handles it as HTML data.
from lxml import html
content = '<tr class="cart-subtotal"><th>RTO / Registration office :</th><td><span class="amount"><h5>Yadgiri</h5></span></td></tr>'
HtmlData = html.fromstring(content)
ListData = HtmlData.xpath(‘//text()’)
And to get the last element:
ListData[-1]
Upvotes: 2