Reputation: 281
This is my first trial to learn to how to scrape images from a web and paste them to Google Sheets. I want to download the second image from https://ir.eia.gov/ngs/ngs.html and paste it to a Google Sheet. In the web, there are two images. I want to get the second image under <img alt="Working Gas in Underground Storage Compared with Five-Year Range" src="ngs.gif" border="0">. I like to learn how to reference its img alt= or src="ngs.gif" in the code, not the index so I can utilize the concept to other various HTML situations also. Can anyone help fix the following code so that I can learn? Thank you!
function test() {
const url = 'https://ir.eia.gov/ngs/ngs.html';
const res = UrlFetchApp.fetch(url, { muteHttpExceptions: true }).getContentText();
var $ = Cheerio.load(res);
// I want to download the image, <img alt="Working Gas in Underground Storage Compared with Five-Year Range" src="ngs.gif" border="0">
// What should be changed in the following code?
var chart = $('img').attr('src').find('ngs.gif');
SpreadsheetApp.getActiveSheet().insertImage(chart, 1, 1);
}
Upvotes: 0
Views: 671
Reputation: 201553
I believe your goal as follows.
img
tags and put it to the Spreadsheet.In this HTML, it seems that the URL is https://ir.eia.gov/ngs/
+ filename
. So I thought that the method of insertImage(url, column, row)
can be used. When this is reflect to your script, how about the following modified script?
function test() {
const url = 'https://ir.eia.gov/ngs/ngs.html';
const res = UrlFetchApp.fetch(url, { muteHttpExceptions: true }).getContentText();
const $ = Cheerio.load(res);
const urls = [];
$('img').each(function () {
urls.push("https://ir.eia.gov/ngs/" + $(this).attr('src'));
});
if (urls.length > 1) {
SpreadsheetApp.getActiveSheet().insertImage(urls[1], 1, 1); // 2nd image is retrieved.
}
}
https://ir.eia.gov/ngs/ngs.gif
is retrieved and the image is put to the Spreadsheet.About your following new question in the comment,
Thanks a lot! So other than calling the index of the image, is there no method to call either alt="Working Gas in Underground Storage Compared with Five-Year Range" or src="ngs.gif" in the code? I'm just curious to learn a smart way for a potential scenario, for instance, if a web has 20 images and the locations of those images keep changing day by day, so the second image is not always in the second place. Thank you again for any guide!
In this case, how about the following sample script?
function test() {
const url = 'https://ir.eia.gov/ngs/ngs.html';
const res = UrlFetchApp.fetch(url, { muteHttpExceptions: true }).getContentText();
const $ = Cheerio.load(res);
const obj = [];
$('img').each(function () {
const t = $(this);
const src = t.attr('src');
obj.push({ alt: t.attr('alt'), src: src, url: "https://ir.eia.gov/ngs/" + src });
});
const searchAltValue = "Working Gas in Underground Storage Compared with Five-Year Range";
const searchSrcValue = "ngs.gif";
const ar = obj.filter(({alt, src}) => alt == searchAltValue && src == searchSrcValue);
if (ar.length > 0) {
SpreadsheetApp.getActiveSheet().insertImage(ar[0].url, 1, 1);
}
}
src
and alt
are Working Gas in Underground Storage Compared with Five-Year Range
and ngs.gif
, respectively, the URL is retrieved and put to the image.Working Gas in Underground Storage Compared with Five-Year Range
OR ngs.gif
, please modify alt == searchAltValue && src == searchSrcValue
to alt == searchAltValue || src == searchSrcValue
.Upvotes: 3