Reputation: 470
I wish to get text from a html page using XPath. The particular text is in the td to right of Description: (inside th element) from the url in the source.
In the first call (commented out) I have tried absolute path from XPath taken from Chrome inspector but I get an empty list. The next call works and gives the heading: "Description:"
I require a generic XPath query that would take a text heading (like "Description:") and give text value of the td next to it.
url = 'http://datrack.canterbury.nsw.gov.au/cgi/datrack.pl?cmd=download&id=ZiFfLxV6W1xHWBN1UwR5SVVSAV0GXUZUcGFGHhAyTykQAG5CWVcARwM='
page = requests.get(url)
tree = html.fromstring(page.content)
# desc = tree.xpath('//*[@id="documentpreview"]/div[1]/table[1]/tbody/tr[2]/td//text()')
desc = tree.xpath("//text()[contains(., 'Description:')]")
I have tried variations of XPath queries but my knowledge is not deep enough. Any help would be appreciated.
Upvotes: 1
Views: 92
Reputation: 880299
Use //*[contains(text(), 'Description:')]
to find tags whose text contains Description:
, and use following-sibling::td
to find following siblings which are td
tags:
In [180]: tree.xpath("//*[contains(text(), 'Description:')]/following-sibling::td/text()")
Out[180]: ['Convert existing outbuilding into a recreational area with bathroom and kitchenette']
Upvotes: 2