Reputation: 1905
Im using cheerio to scrape https://www.snopes.com
I'm trying to get an article's date, but I'm getting back the date repeated numerous times, and sometimes another date for good measure
The source shows <span class="date">9 May 2019</span>
but I'm getting:
9 May 20199 May 20198 May 20198 May 20198 May 20199 May 20199 May 20198 May 20198 May 20198 May 20198 May 20197 May 20192 May 201923 April 201916 April 20193 May 20196 May 20196 May 20197 May 20192 May 20199 May 20199 May 20199 May 20199 May 20199 May 2019
const cheerio = require('cheerio');
const request = require('request');
request('https://www.snopes.com', function (error, response, html) {
if (!error && response.statusCode == 200) {
const $ = cheerio.load(html);
const articleRows = $('.media-list .media-wrapper a');
const articleText = $(articleRows).children(".media-body-wrapper").children(".media-body");
articleText.each((i,el) => {
let articleDate = $(articleText).children("p").children(".date");
console.log(articleDate.text());
})
articleRows.each((i, el) => {
let imageURL = $(el).children(".featured-media").children("img").attr('data-lazy-src');
})
}
});
How can I retrieve exactly what I see in the source?
Upvotes: 1
Views: 62
Reputation: 40384
The problem is, that you're using the whole collection again: articleText
, you should use the current element of the iteration: el
articleText.each((i,el) => {
let articleDate = $(el).children("p").children(".date");
console.log(articleDate.text());
})
Upvotes: 2