Reputation: 73
I am trying to scrape data from a table in a website which has constantly changing values. So each row can vary day to day but I want to be able to scrape the correct data. I am using the Cheerio library at the moment and I am not familiar with it but here's what I have:
const rp = require("request-promise");
const cheerio = require("cheerio");
let Italy = "";
async function main() {
const result = await rp.get("https://www.worldometers.info/coronavirus/");
const $ = cheerio.load(result);
$("#main_table_countries > tbody:nth-child(2) > tr:nth-child(2)").each((i,el) => {
const item = $(el).text();
Italy = item;
});
}
So, as you can see this scrapes data from the worldometer website for the coronavirus cases in Italy. Italy's position however has been changing between 2 and 3 over the past few days. This has resulted in my program fetching the wrong information. This is what I would like to fix.
Here's the link to the worldometer website: https://www.worldometers.info/coronavirus/
Thanks, Karthik
Upvotes: 0
Views: 1126
Reputation: 55002
Use the :contains pseudo for this:
$('tr:contains(Italy)').text()
//" Italy 9,172 +1,797 463 +97 724 7,985 733 151.7 "
Upvotes: 1
Reputation: 3642
What I Implemented is that you can get all the tr's
and loop over them to get all the names and add it to an array and then use the Array Index to find any country you want
async function main() {
let NamesArr=[]
let CountryToFind= 'Italy'
const result = await rp.get("https://www.worldometers.info/coronavirus/");
const $ = cheerio.load(result);
$('#main_table_countries').find('tbody').eq(0).find('tr').each((i,el)=>{
NamesArr.push($(el).find('td').eq(0).text().trim())
})
let Index= NamesArr.indexOf(CountryToFind) + 1
$(`#main_table_countries > tbody:nth-child(2) > tr:nth-child(${Index})`).each((i,el) => {
const item = $(el).text();
console.log(item);
});
}
main()
This Returns me
You can definitely refactor it but this way makes your parser dynamic as you can now search for any country.
Upvotes: 1