Reputation: 1
I'm very new to javascript and Puppeteer as well.
I'm trying to grab some innerHTML from a series of web pages inside a forum. The pages' URLs follow a pattern that has a prefix and '/page-N' at the end, N being the page number.
So I decided to loop through the pages using a for loop and template literals to load a new page URL on each loop, until I reach the final number of pages, contained in the variable C.numberOfPages.
Problem is: the code inside the page.evaluate() function is not working, when I run my code I get the TypeError: Cannot read property of undefined. I've checked and the source of the problem is that document.getElementById('discussion_subentries') is returning undefined.
I've tested the same code that is inside the page.evaluate() function in Chrome Dev Tools and it works fine, returning the innerHTML I wanted. All of those .children[] concatenations were necessary due to the structure of the page I'm scraping, and they work fine at the browser, returning the proper value.
So how do I make it work in my Puppeteer script?
for (let i = 1; i <= C.numberOfPages; i++) {
let URL = `${C.url}page-${i}`;
await page.goto(URL);
await page.waitForSelector('#discussion_subentries');
let pageData = await page.evaluate(() => {
let discussionEntries = document.getElementById('discussion_subentries')
.children[1];
let discussionEntryMessages = [];
for (let j = 0; j < discussionEntries.childElementCount; j++) {
let thisEntryMessage =
discussionEntries.children[j].children[0].children[1].children[1]
.children[1].innerHTML;
discussionEntryMessages.push(thisEntryMessage);
}
return discussionEntryMessages;
});
entryData.discussionEntryMessages.push(pageData);
}
Upvotes: 0
Views: 171
Reputation: 191
Page evaluate is not the problem, it works 100% as the devtools. The problem is most probably that wait for selector doesnt to its proper job and doesnt wait for the element to be properly loaded before going further. Try to debug with adding some sleep instead of the wait for selector, to confirm that thats the problem.
Upvotes: 1