McKay S
McKay S

Reputation: 23

Can we get the attribute names themselves from Sheets importxml

I am trying to get a list of the data attributes from a website using Google Sheets IMPORTXML function. To clarify, I want the names of the attributes themselves as well as their values, and I don't need the text of the table itself.

Trying the website: https://thesilphroad.com/catalog

A sample of the code I want to extract from is:

<div class="pokemonOption sighted" data-nests="1" data-raid-boss="0" data-obtainable="1" data-released="1" data-shiny-obtainable="1" data-shiny-released="1" data-shadow-available="" data-shadow-released="1" data-pokemon-slug="bulbasaur" style="background-image:url(https://assets.thesilphroad.com/img/pokemon/icons/96x96/1.png), radial-gradient(#a9f712, #2ecc71);"><span>#001</span></div>

The formula I'm using is:

=importxml("https://thesilphroad.com/catalog","//div[@class='pokemonOption sighted']/@*

and it returns all the values of the attributes in 1 column, eg:

attributes
pokemonOption sighted
1
1
Bulbasaur
etc...

But what I need is the names as well, eg data-nests="1". Alternatively, what about just a list of the attributes of that , eg.

attribute names
class
data-nests
data-raid-boss
data-released
etc...

Does someone know how to extract this into Sheets using IMPORTXML or another method?

Thanks!

Upvotes: 1

Views: 659

Answers (1)

idfurw
idfurw

Reputation: 5852

I am not so sure, but I think IMPORTXML uses XPath 1.0.

To get all attributes, you need XPath 2.0.

You may want to create a custom formula with Google Apps Script.

function pokemon(cell) {
  const url = 'https://thesilphroad.com/catalog';
  const response = UrlFetchApp.fetch(url);
  const content = response.getContentText();
  const results = content.match(/(<div class="pokemonOption sighted".+?<\/div>)/g)
  /* ... */
}

Upvotes: 1

Related Questions