Eric Moran
Eric Moran

Reputation: 5

Extracting Metadata in Google Sheets ImportXML

Is it still possible to extract metadata like price from a site using ImportXML in Google Sheets?

I have tried multiple different variations on the following page without success: https://www.officedepot.com/a/products/273646/Office-Depot-White-Copy-Paper-Letter/

=IMPORTXML("https://www.officedepot.com/a/products/273646/Office-Depot-White-Copy-Paper-Letter/","//*[contains(@itemprop,'price')]/@content")

=IMPORTXML("https://www.officedepot.com/a/products/273646/Office-Depot-White-Copy-Paper-Letter/","//meta[@itemprop='price']/@content")

I should be able to use this formula to return "58.99", but I continuously receive an NA error.

Upvotes: 0

Views: 720

Answers (1)

E.Wiest
E.Wiest

Reputation: 5905

OfficeDepot seems to block the requests from GoogleSheets. Some clues :

enter image description here

You need an API key and an ImportJSON script (credits to Brad Jasper) to do this. Once you have installed the script and activated your API key, add a search engine. In the settings, you have to define the target website.

enter image description here

Copy somewhere your search engine id (cx=XXXXXXXXXX). Once this is done and assuming you have the urls in column A, you can paste in cell B2 :

=REGEXEXTRACT(A2;"products\/(\d+)")

This is for extracting the product id.

In cell C2, you can paste :

="https://customsearch.googleapis.com/customsearch/v1?cx={yoursearchengineID}&key={yourAPIkey}&num=1&fields=items(pagemap(offer(price)))&q="&B2

We construct the request for the API. You need to add your API key and your search engine id in this formula.

In cell D2, you can paste :

=QUERY(ImportJSON(C2);"SELECT Col1 label Col1''";1)

This for importing the .json result and cleaning it a bit.

Notes : this method could fail with some products (the new ones). I'm based in Europe. So ";" in the formulas should be replaced with ",".

Upvotes: 1

Related Questions