Reputation: 379
I use xml service before.
But I got the error message about "xml is deprecated."
So I know xml
cannot be used in the future,and the XmlService
instead.
Here is my code before.
The solution comes from here.(by Mr.Justin Bicknell)
function xml_parsing(senderId) {
var fetch = UrlFetchApp.fetch
("https://home.gamer.com.tw/homeindex.php?owner=" + senderId);
var doc = Xml.parse(fetch, true);
var bodyHtml = doc.html.body.toXmlString();
var xml = UrlFetchApp.fetch(url).getContentText();
var doc_parse = XmlService.parse(xml);
var root = doc_parse.getRootElement();
}
And I remove xml
to fix it.
function xml_parsing(senderId) {
var url = "https://home.gamer.com.tw/homeindex.php?owner=" + senderId;
var fetch = UrlFetchApp.fetch(url).getContentText();
var doc_parse = XmlService.parse(fetch);
var root = doc_parse.getRootElement();
}
There is some errors about entities occured.
The entity name must immediately follow the '&' in the entity reference
So I fix the url by converting to entities type.
var url = "https://home.gamer.com.tw/homeindex.php?owner="+ senderId
There is some error,neither.
I google other document.
One said that the XmlService.parse
is strict to Html.
Because Html contains less strict standard.
(For example: tags can be an end of tags,
but xml have to double tags enveloped)
So I want to ask how to use XmlService.parse
on the situation?
Thanks!
Upvotes: 1
Views: 989
Reputation: 15375
You need to make sure that the string you are parsing to Xml.parse()
is a string with valid XML containing no malformed tags nor unescaped special characters.
Since the Xml.parse()
method of Apps Script was deprecated, the old leniency parameter that could be optionally set is not part of XmlService.parse().
XmlService.parse()
is an XML parser, not an HTML parser. While the two document types have similar base structures, there are a few differences which cause XmlService.parse()
to throw an error.
The first problem is that XML Documents can not have unclosed tags. As all HTML Documents start with a <!DOCTYPE html>
tag, XmlService.parse()
reads this as an open XML tag but because HTML does not close this, XML reads this as a malformed structure. <meta>
tags in an HTML document also cause this problem as they too are non-closing, though in actuality any HTML tag in this format will cause XmlService.parse()
to throw an error. User Tanaike has a really powerful workflow to rectify this which you can find here.
The second problem is that within the document you are trying to fetch, there is embeded JavaScript within <script></script>
tags. XML has 5 special characters - &, ", ', <, and >
.
All five of these characters are used as operators or string designators in JavaScript, and so unless they have been escaped into XML-safe format, ('&', '"', ''', '<', and '>'
respectively), the parser tries to read the special JavaScript characters as XML characters as they haven't been escaped. It is for this reason that it will throw an entity reference error. In your example page, The reference to entity "l" must end with the ';' delimiter.
is thrown due to a &l
in the code that hasn't been linted.
It seems the XmlService.parse()
method is working as intended as it expects XML as a string, not HTML. There is however, a Bug on Google's Issue Tracker which details this as now that Xml
has been deprecated there is no longer an Apps Script feature that does HTML to XML parsing. If you star the Bug on Issue Tracker in the top right you can let Google know you are also having issues with this, and get updates on their responses.
Upvotes: 1