Anthony Madle
Anthony Madle

Reputation: 31

Finding a table using ImportHTML

I feel like I've tried every solution out here, and have yet to accomplish this task.

I'm looking to scrape the SECOND (playoffs) table on this link:

https://www.basketball-reference.com/players/c/curryst01/gamelog/2016

The first table comes in very easily using IMPORTHTML, the second however I haven't been able to locate.

I've tried using IMPORTHTML with 100 different tables & lists. I also looked in inspector and did a CTRL F on <table and see the info there.

I read that it could be because it's a Javascript object, but when I turned off Javascript (like someone suggested), I still see the table, which leads me to believe it can definitely be scraped into a Google Sheet.

I tried ImportXML as well, but I'm not as familiar and wasn't able to find the info with that either.

Are there any suggestions on how I could scrape this? Seems bizarre to me that it is this difficult!

Upvotes: 1

Views: 383

Answers (3)

Mike Steelson
Mike Steelson

Reputation: 15308

Try this, it will give you the main table

=importhtml(url,"table",8)

you can also retrieve informations for tables #1 to #7

Upvotes: 0

Tanaike
Tanaike

Reputation: 201378

Unfortunately, it seems that IMPORTHTML and IMPORTXML cannot be used for retrieving the table you expect. But, fortunately, I noticed that when the HTML is retrieved by Google Apps Script, the HTML data includes the table of the SECOND (playoffs) table you expect. So in this answer, I would like to propose to use Google Apps Script.

Sample script:

Please copy and paste the following script to the script editor of Google Spreadsheet, and please enable Sheets API at Advanced Google services. And, please run myFunction at the script editor. By this, the retrieved table is put to the sheet.

function myFunction() {
  const url = "https://www.basketball-reference.com/players/c/curryst01/gamelog/2016"; // This URL is from your question.
  const sheetName = "Sheet1";  // Please set the destination sheet name.

  const html = UrlFetchApp.fetch(url).getContentText();
  const tables = [...html.matchAll(/<table[\s\S\w]+?<\/table>/g)];
  if (tables.length > 8) {
    const ss = SpreadsheetApp.getActiveSpreadsheet();
    Sheets.Spreadsheets.batchUpdate({ requests: [{ pasteData: { html: true, data: tables[8][0], coordinate: { sheetId: ss.getSheetByName(sheetName).getSheetId() } } }] }, ss.getId());
    return;
  }
  throw new Error("Expected table cannot be retrieved.");
}

Result:

When this script is run, the following result can be obtained.

enter image description here

References:

Upvotes: 0

Anthony Madle
Anthony Madle

Reputation: 31

I learned I wasn't turning off Javascript properly... well, now the table is gone. So I'm assuming this means it cannot be scraped into Sheets.

Still curious what solutions are out there - I'm currently working on it using ParseHub, but I'd really love to understand how it could be done in Sheets

Upvotes: 0

Related Questions