Carol.Kar
Carol.Kar

Reputation: 5355

Getting TypeError: selector.includes is not a function when scraping with cheerio and jsonframe

I am trying to scrap a website with the following code:

const cheerio = require('cheerio');
const jsonframe = require('jsonframe-cheerio');

const $ = cheerio.load('https://coinmarketcap.com/all/views/all/');
jsonframe($); // initializes the plugin

//exception handling 
process.on('uncaughtException', err =>
  console.error('uncaught exception: ', err))
process.on('unhandledRejection', (reason, p) =>
  console.error('unhandled rejection: ', reason, p))

const frame = {
    "crypto": {         
        "selector": "tbody > tr",   
        "data": [{             
            "name": "td:nth-child(2) > a:nth-child(3)", 
            "url": {                                  
                "selector": "td:nth-child(2) > a:nth-child(3)",    
                "attr": "href"                     
            },
            "marketcap": "tr > td:nth-child(4)",
            "price": "tr > td:nth-child(5) > a:nth-child(1)", 
        }]
    }

};

let companiesList = $('tbody').scrape(frame);
console.log(companiesList); 

However, I get an UnhandledPromiseRejectionWarning when running the above example code:

(node:3890) UnhandledPromiseRejectionWarning: Unhandled promise rejection (rejection id: 1): TypeError: selector.includes is not a function

Any suggestions what I am doing wrong?

I appreciate your replies!

UPDATE

I changed my code to the following below. However, I only can scrap the first element.

Any suggestions why the other elements do not get scrapped?

const cheerio = require('cheerio')
const jsonframe = require('jsonframe-cheerio')
const got = require('got');


async function scrapCoinmarketCap() {
    const url = 'https://coinmarketcap.com/all/views/all/'
    const html = await got(url)
    const $ = cheerio.load(html.body)

    jsonframe($) // initializing the plugin

    let frame = {
        "Coin": "td.no-wrap.currency-name > a",
        "url": "td.no-wrap.currency-name > a @ href",
        "Symbol": "td.text-left.col-symbol",
        "Price": "td:nth-child(5) > a",
    }

    console.log($('body').scrape(frame, {
        string: true
    }))
}

scrapCoinmarketCap()

Upvotes: 10

Views: 1695

Answers (2)

TGrif
TGrif

Reputation: 5931

Based on your updated code, you can scrape all currency data by iterating on each tr:

$('body tr').each(function() {
  console.log($(this).scrape(frame, {
    string: true
  }))
})

However, I think the cleanest way to do this (as I said in another answer) is to use jsonframe-cheerio List/Array frame pattern, which is exactly intended to do that:

let frame = {
  currency: {
    _s: "tr",  // the selector
    _d: [{  // allow you to get an array of data, not just the first item
      "Coin": "td.no-wrap.currency-name > a",
      "Url": "td.no-wrap.currency-name > a @ href",
      "Symbol": "td.text-left.col-symbol",
      "Price": "td:nth-child(5) > a"
    }]
  }
}

console.log($('body').scrape(frame, {
  string: true
}))

Upvotes: 6

Robert Rossmann
Robert Rossmann

Reputation: 12129

The method cheerio.load() does not accept URLs - it requires the HTML as string.

While I have not looked into the source code of cheerio, it would seem that the module tries to parse the URL as an HTML document which, obviously, fails and various errors start to appear.

To fix the problem, you need to first load the HTML content of that URL into a variable and then pass that HTML content to cheerio.

You can do that with modules like request or got.

Here's an example of loading the page using got:

const got = require('got')
const cheerio = require('cheerio')

got('https://google.com')
.then(res => {
  const $ = cheerio.load(res.body)
  // Continue as usual
})
.catch(console.error)

Upvotes: -1

Related Questions