Nick
Nick

Reputation: 147

Cheerio cannot find table by id

I'm trying to scrape https://www.baseball-reference.com/players/p/pujolal01.shtml for player stats, specifically getting from the Standard Batting and Player Value--Batting tables. Here's part of my code:

const page = cheerio.load(response.data);
const statsTable = page('#batting_standard');
const rows = statsTable.find('tbody > tr').not('.minors_table').add(statsTable.find('tfoot > tr:first'));
const moreStatsTable = page('#batting_value');
const moreRows = moreStatsTable.find('tbody > tr, tfoot > tr:first');

For some reason, it's able to retrieve the first table (id = 'batting_standard'), but not the second (id = 'batting_value'), such that moreStatsTable = null. What's going on? I don't understand why cheerio can't find the value table, since it has a unique id. Is it just me having this issue?

Upvotes: 1

Views: 618

Answers (1)

ggorlen
ggorlen

Reputation: 57185

Expanding on chitown88's comment, the data you want appears to be inside comments. The site uses JS after the page loads to display the HTML from these comments.

There's a useful Cheerio GitHub issue #423 which has a method of identifying and extracting data from comments. I adapted this to your use case to find the particular table you want:

const cheerio = require("cheerio"); // 1.0.0-rc.12

const url = "https://www.baseball-reference.com/players/p/pujolal01.shtml";

fetch(url) // Node 18 or install node-fetch, or use another library like axios
  .then(res => {
    if (!res.ok) {
      throw Error(res.statusText);
    }

    return res.text();
  })
  .then(html => {
    const $ = cheerio.load(html);

    $("*").map((i, el) => {
      $(el).contents().map((i, el) => {
        if (el.type === "comment") {
          const $ = cheerio.load(el.data);
          const table = $("#batting_value").first();

          if (table.length) {
            const data = [...table.find("tr")].map(e =>
              [...$(e).find("td, th")].map(e => $(e).text().trim())
            );
            // trim the table a bit for display
            console.table(data.slice(0, 4).map(e => e.slice(0, 4)));
          }
        }
      });
    });
  });

Output:

┌─────────┬────────┬───────┬───────┬──────┐
│ (index) │   0    │   1   │   2   │  3   │
├─────────┼────────┼───────┼───────┼──────┤
│    0    │ 'Year' │ 'Age' │ 'Tm'  │ 'Lg' │
│    1    │ '2001' │ '21'  │ 'STL' │ 'NL' │
│    2    │ '2002' │ '22'  │ 'STL' │ 'NL' │
│    3    │ '2003' │ '23'  │ 'STL' │ 'NL' │
└─────────┴────────┴───────┴───────┴──────┘

Upvotes: 1

Related Questions