Avery Freeman
Avery Freeman

Reputation: 1243

Scrape data from someone else's Google Sheets page using node.js?

Can someone give me an idea of how one might go about scraping data from a Google Sheets page that is not owned by me? Using API not an option.

I see a lot of tutorials about how to USE Sheets for storing web scraping data, but nothing about how to GET info FROM a Sheets page

Hoping I can do this using node.js. Anyone know the best way to go about this?

Upvotes: 3

Views: 1710

Answers (1)

Tanaike
Tanaike

Reputation: 201553

I believe your goal as follows.

  • You want to retrieve the values from the publicly shared Google Spreadsheet.
  • You want to achieve this using Node.js.
  • From trying /edit?usp=sharing yielded /edit#gid=0, and /pubhtml yielded page stating: "We're sorry. This document is not published.", it was found that the Google Spreadsheet is publicly shared which is not as the web publish.

Pattern 1:

In this pattern, the method of "spreadsheets.values.get" in Sheets API is used with the API key. Before you run the script, please retrieve your API key. Ref

Sample script:

const request = require("request");

const apiKey = "###";  // Please set your API key.
const spreadsheetId = "###";  // Please set the Spreadsheet Id.
request.get(
  {
    url:
      `https://sheets.googleapis.com/v4/spreadsheets/${spreadsheetId}/values/A%3AZ?key=${apiKey}`,
    json: true,
  },
  (err, res, body) => {
    if (err) {
      console.log(body);
      return;
    }
    console.log(body.values);
  }
);
  • When above script is run, the values in the cells A:Z of the 1st tab are returned from the publicly shared Spreadsheet as 2 dimensional array.

  • When you want to retrieve other tabs, please retrieve the sheet names using the method of spreadsheets.get in Sheets API. The endpoint of it is as follows.

      https://sheets.googleapis.com/v4/spreadsheets/{spreadsheetId}?fields=sheets.properties.title&key={apiKey}
    

Pattern 2:

In this pattern, Web Apps created by Google Apps Script is used.

Usage:

Please do the following flow.

1. Create new project of Google Apps Script.

Sample script of Web Apps is a Google Apps Script. So please create a project of Google Apps Script.

If you want to directly create it, please access to https://script.new/. In this case, if you are not logged in Google, the log in screen is opened. So please log in to Google. By this, the script editor of Google Apps Script is opened.

2. Prepare script.

Please copy and paste the following script (Google Apps Script) to the script editor. This script is for the Web Apps.

function doGet(e) {
  const spreadsheetId = e.parameter.spreadsheetId;
  const ss = SpreadsheetApp.openById(spreadsheetId);
  const values = ss.getSheets().map(s => ({sheetName: s.getSheetName(), values: s.getDataRange().getValues()}));
  return ContentService.createTextOutput(JSON.stringify(values));
}

3. Deploy Web Apps.

  1. On the script editor, Open a dialog box by "Publish" -> "Deploy as web app".
  2. Select "Me" for "Execute the app as:".
  3. Select "Anyone, even anonymous" for "Who has access to the app:".
  4. Click "Deploy" button as new "Project version".
  5. Automatically open a dialog box of "Authorization required".
    1. Click "Review Permissions".
    2. Select own account.
    3. Click "Advanced" at "This app isn't verified".
    4. Click "Go to ### project name ###(unsafe)"
    5. Click "Allow" button.
  6. Click "OK".
  7. Copy the URL of Web Apps. It's like https://script.google.com/macros/s/###/exec.
    • When you modified the Google Apps Script, please redeploy as new version. By this, the modified script is reflected to Web Apps. Please be careful this.

4. Testing Web Apps using Node.js.

Please access to the URL of your Web Apps using your browser by including the query parameters as follows. When you have already been logged in Google, the script of Web Apps is run.

const request = require("request");

const spreadsheetId = "###";  // Please set the Spreadsheet Id.
const url = "https://script.google.com/macros/s/###/exec";  // Please set the URL of Web Apps.
request.get(
  {
    url: `${url}?spreadsheetId=${spreadsheetId}`,
    json: true,
  },
  (err, res, body) => {
    if (err) {
      console.log(body);
      return;
    }
    console.log(body);
  }
);
  • When above script is run, the following result is returned.

      [
          sheetName: 'Sheet1',
          values: [ [Array], [Array], [Array],,, ]
        },
        {
          sheetName: 'Sheet2',
          values: [ [Array], [Array], [Array],,, ]
        },
        ,
        ,
        ,
      ]
    

Note:

  • When you modified the script of Web Apps, please redeploy the Web Apps as new version. By this, the latest script is reflected to Web Apps. Please be careful this.
  • The Google Apps Script is a simple sample script for explaining the method for achieving your goal. So please modify it for your actual situation.

References:

Upvotes: 2

Related Questions