Reputation: 563
I have a javascript function that i am using to scrape.I am using it with Puppeteer. If I am using one value, it works, but if i introduce a for
loop for it to iterate through an array of values, it fails. I'd like to know what the right place to introduce the for loop would be.
This is my working basic script:
const puppeteer = require('puppeteer');
var listOfURLs = [url1, url2,url3,url4,url5]
let scrape = async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto(listOfURLs[0]);
const result = await page.evaluate(() => {
let title = document.querySelector('#innerLeft > div.dspPropertyTitle > h1').innerText;
return {
title
}
});
browser.close();
return result;
};
scrape().then((value) => {
console.log(value);
});
my URLs are contained in the the variable listOfURLs
. If I manually reference listOfURLs[0], as in the example above, it works just fine. Now I want it to loop through the whole array and access values as listOfURLs[i], so I tried this and it didn't work. I don't know what is wrong.
const puppeteer = require('puppeteer');
var listOfURLs = [url1, url2, url3, url4, url5]
for (i=0; i<=listOfURLs.length; i++) {
let scrape = async () => {
const browser = await puppeteer.launch({headless: true});
const page = await browser.newPage();
await page.goto(listOfURLs[i]);
const result = await page.evaluate(() => {
let title = document.querySelector('#innerLeft > div.dspPropertyTitle > h1').innerText;
return {
title
}
});
browser.close();
return result;
};
scrape().then((value) => {
console.log(value);
});
}
Upvotes: 0
Views: 314
Reputation: 370599
i
is hoisted, and scrape
is async - after a scrape
await
s at the very beginning, the for
loop will have finished, so i
will become listOfURLs.length + 1
, which means accessing listOfURLs[i]
later won't work.
Use let
instead, so that each iteration has a separate binding for i
.
You also should test i < listOfURLs.length
, not i <= listOfURLs.length
, because i < listOfURLs[listOfURLs.length]
will be undefined:
for (let i=0; i < listOfURLs.length; i++) {
But these sorts of for
loops are pretty ugly and are frequent sources of problems like this - you might consider forEach
instead, which has better abstraction, has function scope (is composable) and doesn't require manual iteration, if you're OK sending requests in parallel:
listOfURLs.forEach(async (url) => {
const scrape = async () => {
const browser = await puppeteer.launch({headless: true});
const page = await browser.newPage();
await page.goto(url);
const result = await page.evaluate(() => {
const title = document.querySelector('#innerLeft > div.dspPropertyTitle > h1').innerText;
return { title };
});
browser.close();
return result;
};
scrape().then((value) => {
console.log(value);
});
});
(another option using array methods is reduce
if you want to make requests in serial)
Upvotes: 1
Reputation: 31
Your issue is probably stemming from the fact that you're making asynchronous calls inside the for loop. You want a result before you move on to the next one, and since it's asynchronous your code isn't going to wait for a response and continue onto the next url in the array.
Upvotes: 0