Jan Hennemann
Jan Hennemann

Reputation: 41

Cheerio direct child selector

Hey guys and ladies first of all this is my first question here in stackoverflow so don't be so hard on me.. but w/e :P. I have a problem.. i'm totally new to web scraping and at the moment i have the problem that i can't select the right elements. My code looks like this:

var express = require('express');
var path = require('path');
var request = require('request');
var cheerio = require('cheerio');
var fs = require('fs');

var app = express();
var port = 8000;

var url = "http://www.finanzparasiten.de/html/links/awd.html";

request(url, function (err, resp, body) {
    if(!err) {
        var $ = cheerio.load(body)

        var test = $('body table table table > tbody > tr > td > p');
        console.log(test.html())   
        test.each(function (ii, asdf) {
            var rr = $(asdf).find("table").find("tr").first().find('td:nth-child(2)').text();
            console.log(asdf);
        }) 
    } else {
        console.log("we encountered an error: " + err);
    }
});

app.listen(port);
console.log('server is listening on ' + port);

It keeps logging NULL for the variable test. It seems like cheerio has problems with the > selector. With jQuery this selection would work as expected.

Thanks to @logol's anwser i could solve the first problem but now i facing the problem that i have to select direct childs after body and it seems to bug as the tbody.. any1 got a workaround?

Upvotes: 3

Views: 21863

Answers (2)

John
John

Reputation: 131

Original:

as far as I remember (when I used cheerio the last time) tbody is not recognized in cheerio, just leave it and use this instead:

table > tr > td

PS: thead was working

Update:

it seems to work sometimes even with tbody, try this in REPL

const cheerio = require('cheerio');
const html = '\
<!DOCTYPE html>\
<html>\
  <head>\
    <title>Cheerio Test</title>\
  </head>\
  <body>\
    <div id="#1">\
      <table>\
        <thead>\
          <tr>\
            <th>Month</th>\
            <th>Savings</th>\
          </tr>\
        </thead>\
        <tfoot>\
          <tr>\
            <td>Sum</td>\
            <td>180</td>\
          </tr>\
        </tfoot>\
        <tbody>\
          <tr>\
            <td>January</td>\
            <td>100</td>\
          </tr>\
          <tr>\
            <td>February</td>\
            <td>80</td>\
          </tr>\
        </tbody>\
      </table>\
    </div>\
  </body>\
</html>';
const dom = cheerio.load(html);

// not working:
let tds1 = dom('div#1 > table > tbody > tr > td').map(function () {
  return dom(this).text().trim();
}).get();

// working:
let tds2 = dom('table > tbody > tr > td').map(function () {
  return dom(this).text().trim();
}).get();

// not working:
let tds3 = dom('div#1 > table > tr > td').map(function () {
  return dom(this).text().trim();
}).get();

console.log(tds1);
console.log(tds2);
console.log(tds3);

Upvotes: 4

user3366016
user3366016

Reputation: 1312

Update:

Based on @logol's response, I checked the docs for Cheerio and it says its selectors are built on CSSSelect Library. Their docs have a list of selectors. Child and Parent selectors are supported and it seems to imply all element selectors are too. However, this github issue flags the tbody issue.

Original:

Do you mean to have the duplicate tables listed in your selector and how you're printing it out in console.

Try this:

var test = $('body table > tbody > tr > td > p');
console.log(test.innerHTML)

The output of this on the webpage is:

<span class="TDheadlinebig">AWD - Allgemeiner
                Wirtschaftsdienst</span><span class="TDnormal"><br>
                </span><span class="TDheadlinenormal">zweitgrößte "Strukkibude"
                </span><span class="TDnormal"><br>
                </span>

Upvotes: 2

Related Questions