Reputation: 97
I'm trying to parse a HTML to retrieve the value of tag, on my Google Apps Script code. contains line breaks in attributes, and appears more than once but I only want the first value. (In this case, only 'foo' is required.)
<b class="
"
>
foo
</b><b class="
"
>
var
</b>
On Google Apps Script, functions such as 'getElementByTagName' is not available. So I first though of using regexp but it's not the wise option here. Does anyone have an idea on how I can move forward? Any comment/guess would be highly appreciated!
Upvotes: 1
Views: 5879
Reputation: 1
Don't use the Parser library (https://www.kutil.org/2016/01/easy-data-scrapping-with-google-apps.html) This is NOT an HTML parser at all; it just looks for text between two regular expressions. If you insist on trying it anyway, you will need the new Script ID: 1Mc8BthYthXx6CoIz90-JiSzSafVnT6U3t0z_W3hLTAX5ek4w0G_EIrNw
obtained from the link "Completed code of Parser library" https://script.google.com/d/1Mc8BthYthXx6CoIz90-JiSzSafVnT6U3t0z_W3hLTAX5ek4w0G_EIrNw/edit?usp=drive_web
on the webpage https://www.kutil.org/2016/01/easy-data-scrapping-with-google-apps.html
Upvotes: -1
Reputation: 201613
How about using XmlService for your situation as a workaround? At XmlService, even if there are several line breaks in the tags, the value can be retrieved. I think that there are several workarounds for your situation. So please think of this as one of them.
The flow of sample script is as follows.
var html = '<b class="\n"\n>\nfoo\n</b><b class="\n"\n>\nvar\n</b>\n'; // Your sample value
var xml = '<?xml version="1.0"?><sampleContents>' + html + '</sampleContents>';
var res = XmlService.parse(xml).getRootElement().getChildren()[0].getText().trim();
Logger.log(res) // foo
If this was not what you want, please tell me. I would like to modify it.
Unfortunately, for the value retrieved from the URL, above script cannot be used. So I used "Parser" which is a GAS library for your situation. The sample script is as follows.
var url = "https://www.booking.com/searchresults.ja.html?ss=kyoto&checkin_year=2018&checkin_month=10&checkin_monthday=1&checkout_year=2018&checkout_month=10&checkout_monthday=2&no_rooms=1&group_adults=1&group_children=0";
var html = UrlFetchApp.fetch(url).getContentText();
var res = Parser.data(html).from("<b class=\"\n\"\n>").to("</b>").build().trim();
Logger.log(res) // US$11
M1lugvAXKKtUxn_vdAG9JZleS6DrsjUUV
For your 2nd URL in your comment, it seems that the URL is different from your 1st one. And also your new URL has no tag of <b class=\"\n\"\n>
. By this, the value you want cannot be retrieved. But from the 1st URL in your comment, I presumed about the value what you want. Please confirm the following script?
var url = "https://www.booking.com/searchresults.ja.html?ss=kyotogranvia&checkin_year=2018&checkin_month=10&checkin_monthday=1&checkout_year=2018&checkout_month=10&checkout_monthday=2&no_rooms=1&group_adults=1&group_children=0";
var html = UrlFetchApp.fetch(url).getContentText();
var res = Parser.data(html).from("<span class=\"lp-postcard-avg-price-value\">").to("</span>").build().trim();
Logger.log(res) // US$289
Upvotes: 4