Veera Balla Deva
Veera Balla Deva

Reputation: 788

How to extract text from a html table row

This is my string :

content = '<tr class="cart-subtotal"><th>RTO / Registration office :</th><td><span class="amount"><h5>Yadgiri</h5></span></td></tr>'

I have tried below regular expression to extract the text which is in between h5 element tag:

   reg = re.search(r'<tr class="cart-subtotal"><th>RTO / Registration office :</th><td><span class="amount"><h5>([A-Za-z0-9%s]+)</h5></span></td></tr>' % string.punctuation,content)

It's exactly returns what I wants.

Is there any more pythonic way to get this one ?

Upvotes: 2

Views: 389

Answers (1)

Srevilo
Srevilo

Reputation: 174

Dunno whether this qualifies as more pythonic or not, but it handles it as HTML data.

from lxml import html
content = '<tr class="cart-subtotal"><th>RTO / Registration office :</th><td><span class="amount"><h5>Yadgiri</h5></span></td></tr>'
HtmlData = html.fromstring(content)
ListData = HtmlData.xpath(‘//text()’)

And to get the last element:

ListData[-1]

Upvotes: 2

Related Questions