python absolute XPath returns empty list, generic query better?

Question

I wish to get text from a html page using XPath. The particular text is in the td to right of Description: (inside th element) from the url in the source.

In the first call (commented out) I have tried absolute path from XPath taken from Chrome inspector but I get an empty list. The next call works and gives the heading: "Description:"

I require a generic XPath query that would take a text heading (like "Description:") and give text value of the td next to it.

url = 'http://datrack.canterbury.nsw.gov.au/cgi/datrack.pl?cmd=download&id=ZiFfLxV6W1xHWBN1UwR5SVVSAV0GXUZUcGFGHhAyTykQAG5CWVcARwM='
page = requests.get(url)
tree = html.fromstring(page.content)

# desc = tree.xpath('//*[@id="documentpreview"]/div[1]/table[1]/tbody/tr[2]/td//text()')

desc = tree.xpath("//text()[contains(., 'Description:')]")

I have tried variations of XPath queries but my knowledge is not deep enough. Any help would be appreciated.

unutbu · Accepted Answer

Use //*[contains(text(), 'Description:')] to find tags whose text contains Description:, and use following-sibling::td to find following siblings which are td tags:

In [180]: tree.xpath("//*[contains(text(), 'Description:')]/following-sibling::td/text()")
Out[180]: ['Convert existing outbuilding into a recreational area with bathroom and kitchenette']

python absolute XPath returns empty list, generic query better?

Answers (1)

Related Questions