Tendekai Muchenje
Tendekai Muchenje

Reputation: 563

Where to call a for loop in to iterate through an array of values

I have a javascript function that i am using to scrape.I am using it with Puppeteer. If I am using one value, it works, but if i introduce a for loop for it to iterate through an array of values, it fails. I'd like to know what the right place to introduce the for loop would be.

This is my working basic script:

const puppeteer = require('puppeteer');
var listOfURLs = [url1, url2,url3,url4,url5]
let scrape = async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.goto(listOfURLs[0]);

  const result = await page.evaluate(() => {
    let title = document.querySelector('#innerLeft > div.dspPropertyTitle > h1').innerText;
    return {
      title
    }

  });

  browser.close();
  return result;
};
scrape().then((value) => {
  console.log(value); 
});

my URLs are contained in the the variable listOfURLs. If I manually reference listOfURLs[0], as in the example above, it works just fine. Now I want it to loop through the whole array and access values as listOfURLs[i], so I tried this and it didn't work. I don't know what is wrong.

const puppeteer = require('puppeteer');    
var listOfURLs = [url1, url2, url3, url4, url5]
for (i=0; i<=listOfURLs.length; i++) {
  let scrape = async () => {
    const browser = await puppeteer.launch({headless: true});
    const page = await browser.newPage();

    await page.goto(listOfURLs[i]);

    const result = await page.evaluate(() => {
      let title = document.querySelector('#innerLeft > div.dspPropertyTitle > h1').innerText;
      return {
        title
      }

    });

    browser.close();
    return result;
  };
  scrape().then((value) => {
    console.log(value); 
  });
}

Upvotes: 0

Views: 314

Answers (2)

CertainPerformance
CertainPerformance

Reputation: 370599

i is hoisted, and scrape is async - after a scrape awaits at the very beginning, the for loop will have finished, so i will become listOfURLs.length + 1, which means accessing listOfURLs[i] later won't work.

Use let instead, so that each iteration has a separate binding for i.

You also should test i < listOfURLs.length, not i <= listOfURLs.length, because i < listOfURLs[listOfURLs.length] will be undefined:

for (let i=0; i < listOfURLs.length; i++) {

But these sorts of for loops are pretty ugly and are frequent sources of problems like this - you might consider forEach instead, which has better abstraction, has function scope (is composable) and doesn't require manual iteration, if you're OK sending requests in parallel:

listOfURLs.forEach(async (url) => {
  const scrape = async () => {
    const browser = await puppeteer.launch({headless: true});
    const page = await browser.newPage();
    await page.goto(url);
    const result = await page.evaluate(() => {
      const title = document.querySelector('#innerLeft > div.dspPropertyTitle > h1').innerText;
      return { title };
    });
    browser.close();
    return result;
  };
  scrape().then((value) => {
    console.log(value); 
  });
});

(another option using array methods is reduce if you want to make requests in serial)

Upvotes: 1

chanceoneal6
chanceoneal6

Reputation: 31

Your issue is probably stemming from the fact that you're making asynchronous calls inside the for loop. You want a result before you move on to the next one, and since it's asynchronous your code isn't going to wait for a response and continue onto the next url in the array.

Upvotes: 0

Related Questions