Reputation: 2301
I'm using phantom 6.0.3
to scrape a web page. Here is the initial setup:
(async function () {
const instance = await phantom.create(['--ignore-ssl-errors=yes', '--load-images=no', '--web-security=false'], {logLevel: 'error'});
const page = await instance.createPage();
await page.on('onResourceRequested', function (requestData) {
console.info('Requesting', requestData.url);
});
const url = // Some url
const status = await page.open(url);
const content = await page.evaluate(function () {
return document.querySelector('ul > li');
});
const contentLength = content.length // 5
//Code Block 2 goes here
})();
So far everything works fine. It was able to successfully determine that the length of the content
is 5
(there are 5 li
items). So what I want to do now is get the innerText
of each of those li
elements... and this is where I get my issue.
I've try using a for loop
to retrieve the innerText
of each li
element, but it always returns null
. Here's what I've tried:
//Code Block 2:
for (let i = 0; i < contentLength; i++) {
const info = await page.evaluate(function () {
const element = document.querySelector('ul > li');
return element[i].innerText;
});
console.log(info); // this returns null 5 times
}
I don't know what's going on. I can give a specific index to return, such as: return element[3].innerText
, and this will give me the correct innerText
, but I can't get this working via loop
Upvotes: 2
Views: 503
Reputation: 10068
PhantomJS evaluates the function in a different context so it's not aware of the parameter i
.
You should pass i
to the evaluate function in order to forward it to the browser process:
for (let i = 0; i < contentLength; i++) {
const info = await page.evaluate(function (index) { // notice index argument
const element = document.querySelector('ul > li');
return element[index].innerText;
}, i); // notice second argument is i
console.log(info);
}
Upvotes: 4