Reputation: 1
In google sheets, for my own amusement, I'm trying to display "Galleon in Valley of the Four Winds." as one string from the code below.
I want to do this for about 600 pages, all of which have identical structure in their HTML (without ID's). I'm only ever going to be interested in the first list between the UL tags.
<h3>Source:</h3>
<ul>
<li>
<a href='http://www.wowhead.com/npc=62346' target='_blank'>Galleon</a> in Valley of the Four Winds.
</li>
<li>
<a href='/bmah.php'>The Black Market Auction House</a> (rarely)
</li>
</ul>
There are many many lists in the source code & not always in the same order, which makes something like IMPORTHTML(B2,"list",3) hard to use.
I can get "Galleon" by itself using this
=IMPORTXML(URL, "//a[@href[starts-with(., 'http://www.wowhead.com/npc')]]")
I tried adding a "//li | " but it brought back all of the lists & not the text that I hoped for, which made sense but I'm at a loss on how to proceed further with this.
=IMPORTXML(URL, "//li | //a[@href[starts-with(.,'http://www.wowhead.com/npc')]]")
I've tried reading through guides & guidelines, but at this point I'm just floundering and a bit lost.
Hope that all made sense, many thanks in advance for the replies.
Upvotes: 0
Views: 120
Reputation: 1
Thank you!
I've rebuilt it for Google Sheets & it posts each part in a different cell, however a quick concatenation has built a complete string.
=IMPORTXML(A5,"//li[a[@href[starts-with(.,""wowhead.com/npc"")]]]")
Upvotes: 0
Reputation: 12822
This one is working on your sample
xmllint --html --xpath 'string(//li[a[@href[starts-with(., "http://www.wowhead.com/npc")]]])' test.html
Galleon in Valley of the Four Winds.
Upvotes: 1