Nick Syiek
Nick Syiek

Reputation: 55

Using XPath to get text within a <td> cell using python

I am currently learning how to use XPath to extract information from an HTML document. I am using python and have had no trouble getting values of things like the title of a webpage, but when I try to get the text of a particular cell in a table, I simply get an empty value returned.

Here is my code, I used chrome to copy the XPath of the table cell I want to get the value from.

from lxml import html
import requests

page = requests.get('https://en.wikipedia.org/wiki/List_of_Olympic_Games_host_cities')
tree = html.fromstring(page.content)

#This will get the cell text:
location = tree.xpath('//*[@id="mw-content-text"]/div/table[1]/tbody/tr[1]/td[3]/text()')

print('Location: ', location)

Upvotes: 3

Views: 1794

Answers (2)

matisetorm
matisetorm

Reputation: 853

poked around a bit.

Try: tree.xpath('//*[@id="mw-content-text"]/div/table[1]/tr/td[3]/text()')

I think the content is a bit different on a webpage rendered in Chrome vs what was returned by requests. (i.e. textbody wasn't needed, and specifying tr[1] was yielding empty result. FYI. The xpath you provided checked out and worked fine in chrome.

See Andersson's answer below as well, but basically, tbody may be added by browser, best not to use it in path

Upvotes: 2

Andersson
Andersson

Reputation: 52665

You should not use tbody tag in your XPath expressions as it might be ignored by developer and added by browser while page rendering. You can try below XPath to get required values:

location = tree.xpath('//*[@id="mw-content-text"]/div/table[1]//tr[not(parent::thead)]/td[3]/text()')

The output is

Location:  ['Europe', 'Europe', 'North America', 'Europe', 'Europe', 'Europe', '
Europe', 'Europe', 'Europe', 'Europe', 'Europe', 'North America', 'North America
', 'Europe', 'Europe', 'Asia', '\nEurope', 'Asia', '\nEurope', 'Europe', 'Europe
', 'Europe', 'Europe', 'Europe', 'Europe', 'Europe', 'Oceania', '\nEurope', 'Nor
th America', 'Europe', 'Europe', 'Asia', 'Europe', 'North America', 'Asia', 'Eur
ope', 'Europe', 'North America', 'North America', 'Europe', 'Europe', 'North Ame
rica', 'North America', 'Asia', 'Europe', 'Europe', 'Europe', 'North America', '
Asia', 'Oceania', 'North America', 'Europe', 'Europe', 'Asia', 'North America',
'Europe', 'Europe', 'South America', 'Asia', 'Asia', 'Asia', 'Europe', 'North Am
erica']

Upvotes: 3

Related Questions