Reputation: 448
I have the below function trying to convert a webpage to xml, so I can start extracting out some data from tables etc.
function getWebpageContent() {
var url = "https://training.gov.au/Training/Details/BSBCRT501";
var xml = UrlFetchApp.fetch(url).getContentText();
var document = XmlService.parse(xml);
Logger.log(document);
}
I'm recieving this error:
Exception: Error on line 170: The entity name must immediately follow the '&' in the entity reference.
getWebpageContent @ Code.gs:6
When I search that webpage for the "&" symbol, (assuming that XmlService is confusing the 'and' symbol for some sort of html code and throwing an error) I can only find one hidden one. And am not sure how to circumvent it.
Any way to dodge that error and get the webpage info as Xml in Apps Script?
Upvotes: 1
Views: 838
Reputation: 201713
From your following replying,
The output I want from this page (
https://training.gov.au/Training/Details/BSBCRT501
) is eachin the 'Elements and Performance Criteria' table. I want to save it as an array to then reformat into my spreadsheet. I might just use IMPORTXML in a spreadsheet formula instead.
In this case, how about the following formula?
=IMPORTXML("https://training.gov.au/Training/Details/BSBCRT501","//table[2]//tr")
From your following replying,
That's a great answer thanks and the method I think I'll use. It doesn't link break at each point (2.1, 2.2 etc) unfortunately but it's still good. I don't think I can accept it as the answer though as it doesn't solve the specific Apps Script problem, but thanks a lot for this.
I added a sample script for using Google Apps Script. Could you please confirm it?
Before you use this script, please enable Sheets API at Advanced Google services. When you run this script, the table is put to the active sheet.
function myFunction() {
const url = "https://training.gov.au/Training/Details/BSBCRT501";
const res = UrlFetchApp.fetch(url, {muteHttpExceptions: true});
if (res.getResponseCode() != 200) throw new Error(res.getContentText());
const table = [...res.getContentText().matchAll(/<TABLE[\s\S\w]+?<\/TABLE>/g)];
if (table && table[1][0]) {
const spreadsheet = SpreadsheetApp.getActiveSpreadsheet();
const sheet = spreadsheet.getActiveSheet();
const resource = {requests: [{pasteData: {html: true, data: table[1][0], coordinate: {sheetId: sheet.getSheetId()}}}]};
Sheets.Spreadsheets.batchUpdate(resource, spreadsheet.getId());
}
}
Upvotes: 1
Reputation: 131
This guy here explains the underlying cause, and also gives the solution:
The entity name must immediately follow the '&' in the entity reference
Upvotes: 1