Reputation: 147
I'm trying to scrape https://www.baseball-reference.com/players/p/pujolal01.shtml for player stats, specifically getting from the Standard Batting and Player Value--Batting tables. Here's part of my code:
const page = cheerio.load(response.data);
const statsTable = page('#batting_standard');
const rows = statsTable.find('tbody > tr').not('.minors_table').add(statsTable.find('tfoot > tr:first'));
const moreStatsTable = page('#batting_value');
const moreRows = moreStatsTable.find('tbody > tr, tfoot > tr:first');
For some reason, it's able to retrieve the first table (id = 'batting_standard'), but not the second (id = 'batting_value'), such that moreStatsTable = null
. What's going on? I don't understand why cheerio can't find the value table, since it has a unique id. Is it just me having this issue?
Upvotes: 1
Views: 618
Reputation: 57185
Expanding on chitown88's comment, the data you want appears to be inside comments. The site uses JS after the page loads to display the HTML from these comments.
There's a useful Cheerio GitHub issue #423 which has a method of identifying and extracting data from comments. I adapted this to your use case to find the particular table you want:
const cheerio = require("cheerio"); // 1.0.0-rc.12
const url = "https://www.baseball-reference.com/players/p/pujolal01.shtml";
fetch(url) // Node 18 or install node-fetch, or use another library like axios
.then(res => {
if (!res.ok) {
throw Error(res.statusText);
}
return res.text();
})
.then(html => {
const $ = cheerio.load(html);
$("*").map((i, el) => {
$(el).contents().map((i, el) => {
if (el.type === "comment") {
const $ = cheerio.load(el.data);
const table = $("#batting_value").first();
if (table.length) {
const data = [...table.find("tr")].map(e =>
[...$(e).find("td, th")].map(e => $(e).text().trim())
);
// trim the table a bit for display
console.table(data.slice(0, 4).map(e => e.slice(0, 4)));
}
}
});
});
});
Output:
┌─────────┬────────┬───────┬───────┬──────┐
│ (index) │ 0 │ 1 │ 2 │ 3 │
├─────────┼────────┼───────┼───────┼──────┤
│ 0 │ 'Year' │ 'Age' │ 'Tm' │ 'Lg' │
│ 1 │ '2001' │ '21' │ 'STL' │ 'NL' │
│ 2 │ '2002' │ '22' │ 'STL' │ 'NL' │
│ 3 │ '2003' │ '23' │ 'STL' │ 'NL' │
└─────────┴────────┴───────┴───────┴──────┘
Upvotes: 1