Reputation: 149
I want my program to do this:
Steps 1 and 2 are working fine but I'm running into timeout error with step 3. Based on responses to similar questions on StackOverflow, I used waitForNavigation() with bigger timeout spans (up to 2 min) but I'm still getting the same error. Using waitForSelector() instead of waitForNavigation() is also giving the same error. If I remove both, puppeteer takes a screenshot of the webpage in step 1. I have also tried using different options with waitUntil, such as "domcontentloaded", "loaded", "networkidle0" and "newtorkidle2", but nothing is working. This is my first program in puppeteer and I've been stuck on this problem for a long time.
Here's my code:
await page.waitForSelector('#featured > c-wiz > div.OXo54d > div > div > div > span > span > span.veMtCf');
// await navigation;
await page.screenshot({path: 'learnmore.png'});
console.log('GOT THIS FAR:)');
//await page.close();
await browser.close();
return 0;
Here's the complete program:
const puppeteer = require('puppeteer');
(async () => {
try{
const browser = await puppeteer.launch({headless: false});
const page = await browser.newPage();
// const navigationPromise = page.waitForNavigation({waitUntil: "load"});
//google.com
await page.goto('https://google.com');
await page.type('input.gLFyf.gsfi',"hotels in london");
await page.keyboard.press('Enter');
//search results
// await navigationPromise;
await page.waitForSelector('#rso > div:nth-child(2) > div > div > div > g-more-link > a > div');
await page.click('#rso > div:nth-child(2) > div > div > div > g-more-link > a > div');
//list of hotels
// await navigationPromise;
await page.waitForSelector('#yDmH0d > c-wiz.zQTmif.SSPGKf > div > div.lteUWc > div > c-wiz > div > div.gpcwnc > div.cGQUT > main > div > div.Hkwcrd.Sy8xcb.XBQ4u > c-wiz > div.J6e2Vc > div > div > span > span');
await page.click("#yDmH0d > c-wiz.zQTmif.SSPGKf > div > div.lteUWc > div > c-wiz > div > div.gpcwnc > div.cGQUT > main > div > div.Hkwcrd.Sy8xcb.XBQ4u > c-wiz > div.l5cSPd > c-wiz:nth-child(3) > div > div > div > div.kCsInf.ZJqrAd.qiy8jf.G9g6o > div > div.TPQEac.qg10C.RCpQOe > a > button > span");
//"learn more"
// await navigationPromise;
//This is where timeout error occurs:
await page.waitForSelector('#featured > c-wiz > div.OXo54d > div > div > div > span > span > span.veMtCf');
// await navigation;
await page.screenshot({path: 'learnmore.png'});
console.log('GOT THIS FAR:)');
//await page.close();
await browser.close();
return 0;
}
catch(err){
console.error(err);
}
})()
.then(resolvedValue => {
console.log(resolvedValue);
})
.catch(rejectedValue => {
console.log(rejectedValue);
})
Upvotes: 0
Views: 3349
Reputation: 8841
Your timeout occurs because the selector you are waitng for is not exist on the page. (If you are opening the browser console where the script stucks and launch $(selector)
it will return null
)
Google uses dynamic class and id values, exactly to prevent (or to make it harder) to retrieve data by scripts, the selectors will have different values everytime you visit the page.
If you really need to scrape its content you can use XPath selectors which are less fragile compared to dynamically changing selector names:
E.g.:
await page.waitForXpath('//h3[contains(text(), "The Best Hotels in London")]')
const link = await page.$x('//h3[contains(text(), "The Best Hotels in London")]')
await link[0].click()
Upvotes: 1