Reputation: 31
I feel like I've tried every solution out here, and have yet to accomplish this task.
I'm looking to scrape the SECOND (playoffs) table on this link:
https://www.basketball-reference.com/players/c/curryst01/gamelog/2016
The first table comes in very easily using IMPORTHTML, the second however I haven't been able to locate.
I've tried using IMPORTHTML with 100 different tables & lists. I also looked in inspector and did a CTRL F on <table and see the info there.
I read that it could be because it's a Javascript object, but when I turned off Javascript (like someone suggested), I still see the table, which leads me to believe it can definitely be scraped into a Google Sheet.
I tried ImportXML as well, but I'm not as familiar and wasn't able to find the info with that either.
Are there any suggestions on how I could scrape this? Seems bizarre to me that it is this difficult!
Upvotes: 1
Views: 383
Reputation: 15308
Try this, it will give you the main table
=importhtml(url,"table",8)
you can also retrieve informations for tables #1 to #7
Upvotes: 0
Reputation: 201378
Unfortunately, it seems that IMPORTHTML and IMPORTXML cannot be used for retrieving the table you expect. But, fortunately, I noticed that when the HTML is retrieved by Google Apps Script, the HTML data includes the table of the SECOND (playoffs) table
you expect. So in this answer, I would like to propose to use Google Apps Script.
Please copy and paste the following script to the script editor of Google Spreadsheet, and please enable Sheets API at Advanced Google services. And, please run myFunction
at the script editor. By this, the retrieved table is put to the sheet.
function myFunction() {
const url = "https://www.basketball-reference.com/players/c/curryst01/gamelog/2016"; // This URL is from your question.
const sheetName = "Sheet1"; // Please set the destination sheet name.
const html = UrlFetchApp.fetch(url).getContentText();
const tables = [...html.matchAll(/<table[\s\S\w]+?<\/table>/g)];
if (tables.length > 8) {
const ss = SpreadsheetApp.getActiveSpreadsheet();
Sheets.Spreadsheets.batchUpdate({ requests: [{ pasteData: { html: true, data: tables[8][0], coordinate: { sheetId: ss.getSheetByName(sheetName).getSheetId() } } }] }, ss.getId());
return;
}
throw new Error("Expected table cannot be retrieved.");
}
When this script is run, the following result can be obtained.
Upvotes: 0
Reputation: 31
I learned I wasn't turning off Javascript properly... well, now the table is gone. So I'm assuming this means it cannot be scraped into Sheets.
Still curious what solutions are out there - I'm currently working on it using ParseHub, but I'd really love to understand how it could be done in Sheets
Upvotes: 0