gb_spectrum
gb_spectrum

Reputation: 2301

phantom - can't use loop with page.evaluate

I'm using phantom 6.0.3 to scrape a web page. Here is the initial setup:

(async function () {
    const instance = await phantom.create(['--ignore-ssl-errors=yes', '--load-images=no', '--web-security=false'], {logLevel: 'error'});
    const page = await instance.createPage();
    await page.on('onResourceRequested', function (requestData) {
        console.info('Requesting', requestData.url);
    });

    const url = // Some url

    const status = await page.open(url);
    const content = await page.evaluate(function () {
        return document.querySelector('ul > li');
    });

    const contentLength = content.length // 5

    //Code Block 2 goes here
})();

So far everything works fine. It was able to successfully determine that the length of the content is 5 (there are 5 li items). So what I want to do now is get the innerText of each of those li elements... and this is where I get my issue.

I've try using a for loop to retrieve the innerText of each li element, but it always returns null. Here's what I've tried:

//Code Block 2:
for (let i = 0; i < contentLength; i++) {
    const info = await page.evaluate(function () {
        const element = document.querySelector('ul > li');
        return element[i].innerText;
    });

    console.log(info); // this returns null 5 times
}

I don't know what's going on. I can give a specific index to return, such as: return element[3].innerText, and this will give me the correct innerText, but I can't get this working via loop

Upvotes: 2

Views: 503

Answers (1)

Daniel Krom
Daniel Krom

Reputation: 10068

PhantomJS evaluates the function in a different context so it's not aware of the parameter i.

You should pass i to the evaluate function in order to forward it to the browser process:

for (let i = 0; i < contentLength; i++) {
    const info = await page.evaluate(function (index) { // notice index argument
        const element = document.querySelector('ul > li');
        return element[index].innerText;
    }, i); // notice second argument is i

    console.log(info);
}

Upvotes: 4

Related Questions