froggomad
froggomad

Reputation: 1905

cheerio returning text not in dom

Im using cheerio to scrape https://www.snopes.com

I'm trying to get an article's date, but I'm getting back the date repeated numerous times, and sometimes another date for good measure

The source shows <span class="date">9 May 2019</span> but I'm getting:

9 May 20199 May 20198 May 20198 May 20198 May 20199 May 20199 May 20198 May 20198 May 20198 May 20198 May 20197 May 20192 May 201923 April 201916 April 20193 May 20196 May 20196 May 20197 May 20192 May 20199 May 20199 May 20199 May 20199 May 20199 May 2019

const cheerio = require('cheerio');
const request = require('request');
request('https://www.snopes.com', function (error, response, html) {
  if (!error && response.statusCode == 200) {
    const $ = cheerio.load(html);
    const articleRows = $('.media-list .media-wrapper a');
    const articleText = $(articleRows).children(".media-body-wrapper").children(".media-body");
    articleText.each((i,el) => {
      let articleDate = $(articleText).children("p").children(".date");
      console.log(articleDate.text());
    })

    articleRows.each((i, el) => {
      let imageURL = $(el).children(".featured-media").children("img").attr('data-lazy-src');
    })
  }
});

How can I retrieve exactly what I see in the source?

Upvotes: 1

Views: 62

Answers (1)

Marcos Casagrande
Marcos Casagrande

Reputation: 40384

The problem is, that you're using the whole collection again: articleText, you should use the current element of the iteration: el

articleText.each((i,el) => {
      let articleDate = $(el).children("p").children(".date");
      console.log(articleDate.text());
})

Upvotes: 2

Related Questions