Reputation: 503
i wanted to scrape certain data from a mutual fund website where i can track only selective funds instead of all of them.
so i tried to puppeteer to scrape the dynamic table generated by the website. I manage to get the table but when i try to parse it to cheerio, seems like nothing happen
const scrapeImages = async (username) => {
console.log("test");
const browser = await puppeteer.launch({
args: ['--no-sandbox']
});
const page = await browser.newPage();
await page.goto('https://www.publicmutual.com.my/Our-Products/UT-Fund-Prices');
await page.waitFor(5000);
const data = await page.evaluate( () => {
const tds = Array.from(document.querySelectorAll('div.form-group:nth-child(4) > div:nth-child(1) > div:nth-child(1)'))
return tds.map(td => td.innerHTML)
});
await browser.close();
console.log(data);
let $ = cheerio.load(data);
$('table > tbody > tr > td').each((index, element) => {
console.log($(element).text());
});
};
scrapeImages("test");
ultimately i am not sure how can i do this directly with puppeteer only instead of directing to cheerio for the scraping and also i would like to scrape only selected funds for instance, if you visit the web here https://www.publicmutual.com.my/Our-Products/UT-Fund-Prices
i would like to get only funds from abbreviation
instead of all of them. not sure how can i do this with only puppeteer?
Upvotes: 0
Views: 705
Reputation: 54984
That page has jQuery already which is even better than cheerio:
const rows = await page.evaluate( () => {
return $('.fundtable tr').get().map(tr => $(tr).find('td').get().map(td => $(td).text()))
}
Upvotes: 1