Reputation: 1243
Can someone give me an idea of how one might go about scraping data from a Google Sheets page that is not owned by me? Using API not an option.
I see a lot of tutorials about how to USE Sheets for storing web scraping data, but nothing about how to GET info FROM a Sheets page
Hoping I can do this using node.js. Anyone know the best way to go about this?
Upvotes: 3
Views: 1710
Reputation: 201553
I believe your goal as follows.
trying /edit?usp=sharing yielded /edit#gid=0, and /pubhtml yielded page stating: "We're sorry. This document is not published."
, it was found that the Google Spreadsheet is publicly shared which is not as the web publish.In this pattern, the method of "spreadsheets.values.get" in Sheets API is used with the API key. Before you run the script, please retrieve your API key. Ref
const request = require("request");
const apiKey = "###"; // Please set your API key.
const spreadsheetId = "###"; // Please set the Spreadsheet Id.
request.get(
{
url:
`https://sheets.googleapis.com/v4/spreadsheets/${spreadsheetId}/values/A%3AZ?key=${apiKey}`,
json: true,
},
(err, res, body) => {
if (err) {
console.log(body);
return;
}
console.log(body.values);
}
);
When above script is run, the values in the cells A:Z
of the 1st tab are returned from the publicly shared Spreadsheet as 2 dimensional array.
When you want to retrieve other tabs, please retrieve the sheet names using the method of spreadsheets.get in Sheets API. The endpoint of it is as follows.
https://sheets.googleapis.com/v4/spreadsheets/{spreadsheetId}?fields=sheets.properties.title&key={apiKey}
In this pattern, Web Apps created by Google Apps Script is used.
Please do the following flow.
Sample script of Web Apps is a Google Apps Script. So please create a project of Google Apps Script.
If you want to directly create it, please access to https://script.new/. In this case, if you are not logged in Google, the log in screen is opened. So please log in to Google. By this, the script editor of Google Apps Script is opened.
Please copy and paste the following script (Google Apps Script) to the script editor. This script is for the Web Apps.
function doGet(e) {
const spreadsheetId = e.parameter.spreadsheetId;
const ss = SpreadsheetApp.openById(spreadsheetId);
const values = ss.getSheets().map(s => ({sheetName: s.getSheetName(), values: s.getDataRange().getValues()}));
return ContentService.createTextOutput(JSON.stringify(values));
}
https://script.google.com/macros/s/###/exec
.
Please access to the URL of your Web Apps using your browser by including the query parameters as follows. When you have already been logged in Google, the script of Web Apps is run.
const request = require("request");
const spreadsheetId = "###"; // Please set the Spreadsheet Id.
const url = "https://script.google.com/macros/s/###/exec"; // Please set the URL of Web Apps.
request.get(
{
url: `${url}?spreadsheetId=${spreadsheetId}`,
json: true,
},
(err, res, body) => {
if (err) {
console.log(body);
return;
}
console.log(body);
}
);
When above script is run, the following result is returned.
[
sheetName: 'Sheet1',
values: [ [Array], [Array], [Array],,, ]
},
{
sheetName: 'Sheet2',
values: [ [Array], [Array], [Array],,, ]
},
,
,
,
]
Upvotes: 2