Steve
Steve

Reputation: 331

Fetching data from Wikipedia using Xpath Query function in Google Sheets ImportXML

What would be the correct Xpath query to use to fetch data from Wikipedia into Google Sheets?

Here is an example I would like to test this with:

Wikipedia page: http://en.wikipedia.org/wiki/12_Angry_Men_(1957_film)

Data to pull: the "running time" value of "96 minutes" located in the right hand side table

Method: using the Google Sheets ImportXML function

I've tried the following, but it returns N/A:

=IMPORTXML("http://en.wikipedia.org/wiki/12_Angry_Men_(1957_film)", "//div[normalize-space() = 'Running time']/following-sibling::td")

Thanks!

Upvotes: 1

Views: 788

Answers (1)

dartonw
dartonw

Reputation: 300

There are a couple of problems with your XPath.

The following-sibling axis won't work on the markup of that page because the td following the 'Running time' div is a sibling of its parent th. Instead, use the following axis with a node type selector: following::td. However, that still returns all td nodes after the selected div, so additionally we need a predicate to select only the first node: [1].

Complete function with XPath:

=IMPORTXML("http://en.wikipedia.org/wiki/12_Angry_Men_%281957_film%29", "//div[normalize-space()='Running time']/following::td[1]")

Upvotes: 0

Related Questions