Alteredorange
Alteredorange

Reputation: 636

Continue on Null Value of Result (Nodejs, Puppeteer)

I'm just starting to play around with Puppeteer (Headless Chrome) and Nodejs. I'm scraping some test sites, and things work great when all the values are present, but if the value is missing I get an error like:

Cannot read property 'src' of null (so in the code below, the first two passes might have all values, but the third pass, there is no picture, so it just errors out).

Before I was using if(!picture) continue; but I think it's not working now because of the for loop.

Any help would be greatly appreciated, thanks!

for (let i = 1; i <= 3; i++) {
//...Getting to correct page and scraping it three times
  const result = await page.evaluate(() => {
      let title = document.querySelector('h1').innerText;
      let article = document.querySelector('.c-entry-content').innerText;
      let picture = document.querySelector('.c-picture img').src;

      if (!document.querySelector('.c-picture img').src) {
        let picture = 'No Link';     }  //throws error

      let source = "The Verge";
      let categories = "Tech";

      if (!picture)
                continue;  //throws error

      return {
        title,
        article,
        picture,
        source,
        categories
      }
    });
}

Upvotes: 2

Views: 2681

Answers (2)

skylize
skylize

Reputation: 1431

let picture = document.querySelector('.c-picture img').src;

if (!document.querySelector('.c-picture img').src) {
    let picture = 'No Link';     }  //throws error

If there is no picture, then document.querySelector() returns null, which does not have a src property. You need to check that your query found an element before trying to read the src property.

Moving the null-check to the top of the function has the added benefit of saving unnecessary calculations when you are just going to bail out anyway.

async function scrape3() {
  // ... 
  for (let i = 1; i <= 3; i++) {
  //...Getting to correct page and scraping it three times
    const result = await page.evaluate(() => {
        const pictureElement = document.querySelector('.c-picture img');
      
        if (!pictureElement) return null;
      
        const picture = pictureElement.src;
        const title = document.querySelector('h1').innerText;
        const article = document.querySelector('.c-entry-content').innerText;

        const source = "The Verge";
        const categories = "Tech";

        return {
          title,
          article,
          picture,
          source,
          categories
        }
    });

    if (!result) continue;

    // ... do stuff with result
  }

Answering comment question: "Is there a way just to skip anything blank, and return the rest?"

Yes. You just need to check the existence of each element that could be missing before trying to read a property off of it. In this case we can omit the early return since you're always interested in all the results.

async function scrape3() {
  // ...
  for (let i = 1; i <= 3; i++) {
    const result = await page.evaluate(() => {
        const img = document.querySelector('.c-picture img');
        const h1 = document.querySelector('h1');
        const content = document.querySelector('.c-entry-content');

        const picture = img ? img.src : '';
        const title = h1 ? h1.innerText : '';
        const article = content ? content.innerText : '';
        const source = "The Verge";
        const categories = "Tech";

        return {
          title,
          article,
          picture,
          source,
          categories
        }
    });
    // ... 
  }
}

Further thoughts

Since I'm still on this question, let me take this one step further, and refactor it a bit with some higher level techniques you might be interested in. Not sure if this is exactly what you are after, but it should give you some ideas about writing more maintainable code.

// Generic reusable helper to return an object property
// if object exists and has property, else a default value
// 
// This is a curried function accepting one argument at a
// time and capturing each parameter in a closure.
//
const maybeGetProp = default => key => object =>
  (object && object.hasOwnProperty(key)) ? object.key : default

// Pass in empty string as the default value
//
const getPropOrEmptyString = maybeGetProp('')

// Apply the second parameter, the property name, making 2
// slightly different functions which have a default value
// and a property name pre-loaded. Both functions only need
// an object passed in to return either the property if it
// exists or an empty string.
//
const maybeText = getPropOrEmptyString('innerText')
const maybeSrc = getPropOrEmptyString('src')

async function scrape3() {
  // ...

  // The _ parameter name is acknowledging that we expect a
  // an argument passed in but saying we plan to ignore it.
  //
  const evaluate = _ => page.evaluate(() => {
    
    // Attempt to retrieve the desired elements
    // 
    const img = document.querySelector('.c-picture img');
    const h1 = document.querySelector('h1')
    const content = document.querySelector('.c-entry-content')

    // Return the results, with empty string in
    // place of any missing properties.
    // 
    return {
      title: maybeText(h1),
      article: maybeText(article),
      picture: maybeSrc(img),
      source: 'The Verge',
      categories: 'Tech'
    }
  }))

  // Start with an empty array of length 3
  // 
  const evaluations = Array(3).fill()

    // Then map over that array ignoring the undefined
    // input and return a promise for a page evaluation
    //
    .map(evaluate)

  // All 3 scrapes are occuring concurrently. We'll
  // wait for all of them to finish.
  //
  const results = await Promise.all(evaluations)

  // Now we have an array of results, so we can 
  // continue using array methods to iterate over them
  // or otherwise manipulate or transform them
  // 
  return results
    .filter(result => result.title && result.picture)
    .forEach(result => {
      //
      // Do something with each result
      // 
    })
}

Upvotes: 3

Giorgi Tediashvili
Giorgi Tediashvili

Reputation: 1

Try-catch worked for me:

try {
    if (await page.$eval('element')!==null) {
        const name = await page.$eval('element')
    }
}catch(error){
     name = ''
}

Upvotes: 0

Related Questions