Reputation: 3025
Using the IMPORTXML
function, is it possible to construct an XPATH query that pulls the Industry value for a given Wikipedia page?
For example, the value I want to pull from this page - https://en.wikipedia.org/wiki/Target_Corporation - is "Retail" whereas on this page - https://en.wikipedia.org/wiki/Boohoo.com - it would be "Fashion".
Upvotes: 2
Views: 338
Reputation: 201348
If my understanding is correct, as other pattern, how about the formula with this xpath? Please think of this as just one of several answers.
=IMPORTXML(A1,"//th[text()='Industry']/following-sibling::td")
//th[text()='Industry']/following-sibling::td
.https://en.wikipedia.org/wiki/Target_Corporation
or https://en.wikipedia.org/wiki/Boohoo.com
is put in the cell "A1".From your replying, I knew that you want to add 2 more URLs. So all URLs are as follows.
https://en.wikipedia.org/wiki/Target_Corporation
For above URLs, when the formula of =IMPORTXML(A1,"//th[text()='Industry']/following-sibling::td")
is used, Retail
, Fashion
, Retail
and Travel, services
are returned.
When the xpath is modified to //th[text()='Industry']/following-sibling::td/a
, Retail
, #N/A
, #N/A
and Travel
are returned.
The reason of this is due to the following difference.
<tr>
<th scope="row">Industry</th>
<td class="category"><a href="/wiki/Travel" title="Travel">Travel</a> services</td>
</tr>
and
<tr>
<th scope="row" style="padding-right:0.5em;">Industry</th>
<td class="category" style="line-height:1.35em;"><a href="/wiki/Retail" title="Retail">Retail</a></td>
</tr>
and
<tr>
<th scope="row" style="padding-right:0.5em;">Industry</th>
<td class="category" style="line-height:1.35em;">Fashion</td>
</tr>
By this, I think that unfortunately, in order to retrieve Travel
, Retail
and Fashion
from above, those cannot be directly retrieved with only one xpath. So I used a built-in function for this situation.
In this workaround, I used INDEX
. Please think of this as just one of several answers.
=INDEX(IMPORTXML(A1,"//th[text()='Industry']/following-sibling::td"),1,1)
//th[text()='Industry']/following-sibling::td
. This is not modified.INDEX
.Upvotes: 2
Reputation: 1
try:
=INDEX(IMPORTXML("https://en.wikipedia.org/wiki/Boohoo.com",
"//td[@class='category']"), 2, 1)
=INDEX(IMPORTXML("https://en.wikipedia.org/wiki/Target_Corporation",
"//td[@class='category']"),2,1)
Upvotes: 0