Oleg Kontsograda
Oleg Kontsograda

Reputation: 23

Async function not returning resolve to variable/const

I've ben working on making a simple web scraper to get a deeper understanding of using async/await and promise/resolve functionality. I'm having an issue with an async/await function not returning my resolve properly. I'm able to console.log the complete dataset within the function, right before the return resolve, but I'm not able to assign a constant and call the function outside the scope of the function itself, and console.log that data (it's returned as undefined).

This is the function that is supposed to return the dataset i'm scraping for. The saleHeaders are being pulled successfully, because I'm able to scrape for the content and assign to it's respective constant. When using the console.log that's commented out below, I get the complete dataset output.

async function getInnerPosts(saleHeaders) {
  await Promise.all(saleHeaders.map(async (job) => {
    const dataSet = [];
    return new Promise(resolve => {
      request(job.fullURL, (err, res, html) => {
        const $ = cheerio.load(html);
        $('.result-info').each((index, element) => {
            const postTitle     = $(element).children(".result-title").text();
            const postDate      = $(element).children(".result-date").attr('title');
            const postLink      = $(element).children("a").attr('href');
            const postPrice     = $(element).children(".result-meta").children(".result-price").text();
            const postLocation  = $(element).children(".result-meta").children(".result-hood").text().replace(/[{()}]/g, '');
            // gather data to one const
            const fetchedData   = { postTitle, postDate, postLink, postPrice, postLocation };
            dataSet.push(fetchedData);
            // console.log(dataSet);
        });
        return resolve(dataSet);
      });
    });
  }));
}

But when I try to run the function, assigning it to a constant, and then attempt to log that constant, I'm getting undefined without any warning/errors being thrown. I've attempted to re-structure the returning of the dataset and re-writing the complete function from scratch to make sure I didn't miss some minor mistake, but no luck.

async function scrapeData() {
    const saleHeaders = await getForSaleHeader();
    // Loop through the categories and pull the inner-data from the posts page
    const innerPosts = await getInnerPosts(saleHeaders);
    console.log(innerPosts);
}

This is the time, and output i'm receiving when running the above-mentioned code sample:

undefined

real    0m31.272s
user    0m33.594s
sys 0m0.322s

EDIT I am calling the whole script to run as well:

// run the script
scrapeData();

Upvotes: 0

Views: 1164

Answers (3)

Jordan Running
Jordan Running

Reputation: 106037

The root of the problem is that you're mixing Promises with async/await. Remember that if a function returns a Promise (e.g. return new Promise or return Promise.all), then it should not be async.

A smaller but just as significant problem is that getInnerPosts doesn't return anything. It begins with await Promise.all(...). It should begin with return Promise.all(...). Without a return there, getInnerPosts returns undefined, so the function that calls it is essentially just doing await undefined, which immediately resolves to undefined.

Let's break this down step by step. First, get most of this code out of the way by pulling the cheerio selector code into its own function:

function getData(element) {
  const $el = $(element);
  return {
    postTitle: $el.children('.result-title').text(),
    postDate: $el.children('.result-date').attr('title'),
    postLink: $el.children('a').attr('href'),
    postPrice: $el.children('.result-meta > .result-price').text(),
    postLocation: $el.children('.result-meta > .result-hood').text().replace(/[{()}]/g, ''),
  };
}

Next, get rid of the async/awaits and add a return before Promise.all:

function getInnerPosts(saleHeaders) {
  return Promise.all(saleHeaders.map(job => {
    const dataSet = [];
    return new Promise(resolve => {
      request(job.fullURL, (err, res, html) => {
        const $ = cheerio.load(html);
        $('.result-info').each((index, element) => dataSet.push(getData(element)));
        resolve(dataSet);
      });
    });
  }));
}

This should be sufficient to make your code work as expected. However, that pyramid structure is not very easy to read. Let's see if we can improve it.

First, you can wrap your call to request in a function that returns a Promise (alternatively you could use e.g. request-promise-native). Having done that, you can make the callback you pass to saleHeaders.map an async function and use await pRequest(...) inside. Finally, you can eliminate dataSet entirely by using cheerio's map instead of calling push inside an each loop. The end result looks like this:

function pRequest(url) {
  return new Promise((resolve, reject) =>
    request(url, (err, res, html) => err ? reject(err) : resolve(html)));
}

function getInnerPosts(saleHeaders) {
  return Promise.all(saleHeaders.map(async job => {
    const html = await pRequest(job.fullURL);
    const $ = cheerio.load(html);
    return $('.result-info').map((index, element) => getData(element)));
  }));
}

In reply to your comment:

Still trying to wrap my head around the difference between async/await and promises.

Understanding the difference—or, more importantly, the relationship—between async/await and promises is not trivial. It requires three or four layers of understanding. You need to understand how asynchronous callbacks (e.g. the function you pass to request or setTimeout) work in JS, then you need to understand how promises relate to callbacks, and finally how async/await relates to promises. It's a tough nut to crack, and I think most people pick it up just through experience.

If I have any advice, it's to slow down. Think about each part of your code and what it's actually doing. When you write this:

request(url, (err, res, body) => { /* … */ });

…what happens to that second argument (the callback) inside the request function?

When you write this:

fetch(url)
  .then(res => { /* … */ })
  .catch (err => { /* … */ });

…what does fetch(url) return, and what is then? Why can you write this instead (in an async function)?

try {
  const res = await fetch(url);
  // …
} catch (err) {
  // …
}

Will it behave the same way, or are there differences?

Slow down and think about what you're writing. Don't add an await unless you know why you're doing it. Don't add a new Promise unless you know why you're doing it.

And finally, break your code up into smaller functions. It's easy to lose track of where you need to return or where you want an await when your function has a Promise.all, a new Promise, and three nested callbacks. It's much easier to tell if a function could be async if it's only 2–3 lines long.

Is there any advantage of using one over the other?

IMO the main advantage of async/await is readability. Promise constructors and nested then/catch callbacks work, but they tend to obscure the intent of your code. async/await makes your code read more like normal synchronous code (not to mention "await" is an English word that's already familiar) which makes it easier to divine the intent. Of course, you still need Promises. await makes a function effectively pause, but sometimes there's only one thing you want to defer while the rest of the function executes, so you need then. And there's no replacement for Promise.all

Upvotes: 1

hackape
hackape

Reputation: 19967

async function getInnerPosts(saleHeaders) {
  // here
  return await Promise.all(saleHeaders.map(async (job) => {
    const dataSet = [];
    return new Promise(resolve => {

You have to return it because you want to further use the value outside the function.

const innerPosts = await getInnerPosts(saleHeaders);

If you don't, where do you think the value of innerPosts comes from? It's returned from the getInnerPosts!

Upvotes: 0

Schutt
Schutt

Reputation: 1134

The problem is that your return statement only returns inside the map function (map has a return statement but the function itself does not!

Upvotes: 1

Related Questions