Reputation: 417
I am just starting out in Google Apps Script. Since best coding practices recommend using as few sheet formulas as possible I am trying to do my web scraping with GAS Parser then push the data over to my spreadsheet.
Within my sheet using the below formula returns a table of data which is exactly what I am looking for from GAS.
=IMPORTHTML("https://finance.yahoo.com/quote/BOO.L/history?p=BOO.L", "table", 1)
The two questions here & here are similar but trying those methods also fail. It almost seems like I am not getting the full page content since when I view data in Logger.log() after the code below I am not getting anything that resembles the page I need.
UrlFetchApp.fetch(url).getContentText();
Since running the formula seems to get the data perfectly I can only assume the problems with my own code but can't figure where. Here is the code I have tried thus far;
function scrapeData() {
var url = "https://finance.yahoo.com/quote/BARC.L/history?p=BARC.L";
var fromText = '<td class="Py(10px) Ta(start) Pend(10px)"><span>';
var toText = '</span></td>';
var content = UrlFetchApp.fetch(url).getContentText();
var scraped = Parser
.data(content)
.from(fromText)
.to(toText)
.iterate();
Logger.log(scraped)
}
Any guidance much appreciated.
Upvotes: 1
Views: 559
Reputation: 201358
If my understanding is correct, how about this modification? I think that there are several answers for your situation. So please think of this as one of them.
Parser
and XmlService
.
Parser
.XmlService
. I think that XmlService
makes us easily parse the table.XmlService
is the strong parsing tool of XML. So when this can be used to HTML, it makes us retrieve the values from HTML more easily. However, recently, the most HTML cannot be directly parsed by XmlService
. So I always use this flow.
function scrapeData() {
// Retrieve table as a string using Parser.
var url = "https://finance.yahoo.com/quote/BOO.L/history?p=BOO.L";
// var url = "https://finance.yahoo.com/quote/BARC.L/history?p=BARC.L";
var fromText = '<div class="Pb(10px) Ovx(a) W(100%)" data-reactid="30">';
var toText = '<div class="Mstart(30px) Pt(10px)"';
var content = UrlFetchApp.fetch(url).getContentText();
var scraped = Parser.data(content).from(fromText).to(toText).build();
// Parse table using XmlService.
var root = XmlService.parse(scraped).getRootElement();
// Retrieve header
var headerTr = root.getChild("thead").getChildren();
var res = headerTr.map(function(e) {return e.getChildren().map(function(f) {return f.getValue()})});
var len = res[0].length;
// Retrieve values
var valuesTr = root.getChild("tbody").getChildren();
var values = valuesTr.map(function(e) {return e.getChildren().map(function(f) {return f.getValue()})})
.map(function(e) {return e.length == len ? e : e.concat(Array.apply(null, new Array(len - e.length)).map(String.prototype.valueOf,""))});
Array.prototype.push.apply(res, values);
// Put the result to the active spreadsheet.
var ss = SpreadsheetApp.getActiveSheet();
ss.getRange(1, 1, res.length, res[0].length).setValues(res);
}
p=BOO.L
and p=BARC.L
. I couldn't confirm others. So when you tried others, if an error occurs, please modify the script.If this was not what you want, I'm sorry.
Upvotes: 1