Reputation: 154
I've written a script in node
in combination with puppeteer
to scrape the name of different institutions traversing multiple pages in a website.
My following script can parse the name of institutions from the landing page and then initiate few clicks while parsing the name from other pages and finally encounters an error at some point within the execution.
the error: TypeError: Cannot read property 'click' of undefined
at main (c:\Users\WCS\Desktop\Node vault\comments.js:18:25)
at <anonymous>
at process._tickCallback (internal/process/next_tick.js:118:7)
I've used harcoded for loop
as I don't really have any idea to let the script keep on clicking on the next page button until there is none left. I wish to comply with any logic so that my script will first look for the next page button. If it finds one then it will click on that button and repeat the process.
I've tried with:
const puppeteer = require('puppeteer');
const link = "https://www.incometaxindia.gov.in/Pages/utilities/exempted-institutions.aspx";
(async function main() {
try {
const browser = await puppeteer.launch({headless:false});
const [page] = await browser.pages();
await page.goto(link);
await page.waitForSelector("h1.faqsno-heading");
for(let i = 1; i < 20; i++){
const sections = await page.$$("h1.faqsno-heading");
for (const section of sections) {
const itemName = await section.$eval("div[id^='arrowex']", el => el.innerText);
console.log(itemName);
}
const nextPage = await page.$$(".ms-paging > a");
await nextPage[i].click();
await page.waitForNavigation({waituntil:'networkidle0'});
}
await browser.close();
} catch (e) {
console.log('the error: ', e);
}
})();
Btw, to save this post from duplicity I must acknowledge that I've come across this post but I don't think I myself can implement the logic within my script.
Upvotes: 0
Views: 120
Reputation: 309
Replace this code
const nextPage = await page.$$(".ms-paging > a");
await nextPage[i].click();
await page.waitForNavigation({waituntil:'networkidle0'});
with this
await page.click("[title='Next Page']")
await page.waitForNavigation({waituntil:'networkidle0'})
const puppeteer = require('puppeteer');
const link = "https://www.incometaxindia.gov.in/Pages/utilities/exempted-institutions.aspx";
(async function main() {
try {
const browser = await puppeteer.launch({headless:false});
const [page] = await browser.pages();
await page.goto(link);
await page.waitForSelector("h1.faqsno-heading");
let j=0;
let NoOfPage=9 // adjust here to get number of pages
for(let i = 0; j<NoOfPage+1; i++,j++){
if (j>4) {
i=4;
}
if (i>0) {
await page.waitForSelector("h1.faqsno-heading",{visible:true});
const sections = await page.$$("h1.faqsno-heading");
for (const section of sections) {
const itemName = await section.$eval("div[id^='arrowex']", el => el.innerText);
console.log(itemName);
}
}
const nextPage= await page.$$(".ms-paging > a");
await Promise.all([
await nextPage[i].click(),
await page.waitForNavigation({waituntil:'networkidle0'}),
])
}
await browser.close();
} catch (e) {
console.log('the error: ', e);
}
})();
C:\NodeJS\PuppeteerTest\Pup>node stack56652523.js
....
....
HAPPY PUBLIC SCHOOL SAMITI
AABAH3894H
SAGRADA FAMILIA SOCIETY, SOUTH GOA
AAWAS5165K
K V DEVADIGA CHARITABLE TRUST, DAKSHINA KANNADA
AADTK1517B
SHRINE OF INFANT JESUS, CHICKMAGLUR
AAVTS1925P
SRI NANDI VEDACURU CHARITABLE, TRUST
AATTS1842D
SHREE SUBRAHMANYA VANGMAYEE PARISHAD, GOA
AAPTS2410M
SHREE SUBRAHMANYA VANGMAYEE PARISHAD, GOA
AAPTS2410M
WORD FOR THE WORLD FELLOWSHIP
AAAAW6295Q
JANA SEVA TRUST
AACTJ0594Q
VAGDEVI VILAS EDUCATIONAL AND CHARITABLE TRUST
AABTV8264G
Upvotes: 1
Reputation: 18826
Have you tried with a simple if
condition?
const nextPage = await page.$$(".ms-paging > a");
if(nextPage && nextPage[i]){
await nextPage[i].click();
await page.waitForNavigation({waituntil:'networkidle0'});
}
This way it will click only if there is a button.
Upvotes: 1