How to extract text from a html table row

Question

This is my string :

content = 'RTO / Registration office :Yadgiri'

I have tried below regular expression to extract the text which is in between h5 element tag:

   reg = re.search(r'RTO / Registration office :([A-Za-z0-9%s]+)' % string.punctuation,content)

It's exactly returns what I wants.

Is there any more pythonic way to get this one ?

Srevilo · Accepted Answer

Dunno whether this qualifies as more pythonic or not, but it handles it as HTML data.

from lxml import html
content = 'RTO / Registration office :Yadgiri'
HtmlData = html.fromstring(content)
ListData = HtmlData.xpath(‘//text()’)

And to get the last element:

ListData[-1]

Answers (1)