Reputation: 23
I am trying to get a list of the data attributes from a website using Google Sheets IMPORTXML function. To clarify, I want the names of the attributes themselves as well as their values, and I don't need the text of the table itself.
Trying the website: https://thesilphroad.com/catalog
A sample of the code I want to extract from is:
<div class="pokemonOption sighted" data-nests="1" data-raid-boss="0" data-obtainable="1" data-released="1" data-shiny-obtainable="1" data-shiny-released="1" data-shadow-available="" data-shadow-released="1" data-pokemon-slug="bulbasaur" style="background-image:url(https://assets.thesilphroad.com/img/pokemon/icons/96x96/1.png), radial-gradient(#a9f712, #2ecc71);"><span>#001</span></div>
The formula I'm using is:
=importxml("https://thesilphroad.com/catalog","//div[@class='pokemonOption sighted']/@*
and it returns all the values of the attributes in 1 column, eg:
attributes |
---|
pokemonOption sighted |
1 |
1 |
Bulbasaur |
etc... |
But what I need is the names as well, eg data-nests="1". Alternatively, what about just a list of the attributes of that , eg.
attribute names |
---|
class |
data-nests |
data-raid-boss |
data-released |
etc... |
Does someone know how to extract this into Sheets using IMPORTXML or another method?
Thanks!
Upvotes: 1
Views: 659
Reputation: 5852
I am not so sure, but I think IMPORTXML
uses XPath 1.0
.
To get all attributes, you need XPath 2.0
.
You may want to create a custom formula with Google Apps Script.
function pokemon(cell) {
const url = 'https://thesilphroad.com/catalog';
const response = UrlFetchApp.fetch(url);
const content = response.getContentText();
const results = content.match(/(<div class="pokemonOption sighted".+?<\/div>)/g)
/* ... */
}
Upvotes: 1