cybergerac
cybergerac

Reputation: 11

scraping an XML webpage with IMPORTXML

I am trying to scrape this webpage: https://cordis.europa.eu/project/id/654008 to obtain in tabular format some of the values in XML: legalName, shortName, country, id, city, vatNumber, ecContribution, etc.

Here is an example with headers of the table I wish to obtain.

I have tried using IMPORTXML in Google Sheets but with no great success so far. My problem is that I cannot identify the correct xpath query, even when using chrome extensions like SelectorGadget and XPath Helper who generally helps you find them.

One note: it seems that the "coordinator" table use different html tags. Do you have any suggestions to correctly scrape this page? Thank you in advance!

Upvotes: 1

Views: 81

Answers (2)

Muhammet Yunus
Muhammet Yunus

Reputation: 589

When we try to fetch this page directly it returns JavaScript is disabled on your browser. But when we analyze requests of the page we find a request than returns all data you need as a JSON. It's

https://cordis.europa.eu/api/details?contenttype=project&rcn=654008&lang=en&paramType=id

We fetch this page via URLFetchApp, and push data we wish to an array, then write them to our sheet.

function fetch() {
  var url = 'https://cordis.europa.eu/api/details?contenttype=project&rcn=654008&lang=en&paramType=id';
  var response = UrlFetchApp.fetch(url);
  var json = JSON.parse(response.getContentText()).payload.organizations.participants;
  var array = [];

  json.forEach(el => {
    array.push([el.name, , el.country.name, el.organizationId, el.address.city, , el.contribution]);
  })
  var ss = SpreadsheetApp.getActiveSpreadsheet().getSheetByName('main');
  var myRange = ss.getRange(2, 1, array.length, 7);
  myRange.setValues(array);

}

Upvotes: 2

player0
player0

Reputation: 1

maybe this will help:

=INDEX(REGEXEXTRACT(QUERY(FLATTEN(IMPORTDATA(
 "https://cordis.europa.eu/project/id/654008/en?format=xml")), 
 "where Col1 contains 'country'"), ">(.*)<"))

enter image description here

Upvotes: 1

Related Questions