user11368144
user11368144

Reputation:

Promise doesn't wait for functions promise to be resolved

So I've been working on a scraper project.

Now I've implemented many things but I've been stuck on this one thing.

So first let me explain workflow: Scrapers are called in scraping-service module, where I wait for the promise of the functions called to be resolved. Data is fetched in scrapers, and passed to the data_functions object where data is: merged, validated and inserted into DB.

Now here is the code:

scraping-service

const olxScraper = require('./scrapers/olx-scraper');
const santScraper = require('./scrapers/sant-scraper');
//Calling scraper from where we want to get data about apartments
const data_functions = require('./data-functions/dataF');

let count = 1;

Promise.all([
  olxScraper.olxScraper(count),
  santScraper.santScraper(count),
]).then(() => data_functions.validateData(data_functions.mergedApartments));

So here I'm waiting for the promise of these two functions, and then passing merged data to validateData method in the data_functions.

Here is the scraper:

const axios = require('axios'); //npm package - promise based http client
const cheerio = require('cheerio'); //npm package - used for web-scraping in server-side implementations
const data_functions = require('../data-functions/dataF');

//olxScraper function which as paramater needs count which is sent in the scraping-service file.
exports.olxScraper = async (count) => {
  const url = `https://www.olx.ba/pretraga?vrsta=samoprodaja&kategorija=23&sort_order=desc&kanton=9&sacijenom=sacijenom&stranica=${count}`;
  //url where data is located at.
  const olxScrapedData = [];
  try {
    await load_url(url, olxScrapedData); //pasing the url and empty array
  } catch (error) {
    console.log(error);
  }
};

//Function that does loading URL part of the scraper, and starting of process for fetching raw data.
const load_url = async (url, olxScrapedData) => {
  await axios.get(url).then((response) => {
    const $ = cheerio.load(response.data);
    fetch_raw_html($).each((index, element) => {
      process_single_article($, index, element, olxScrapedData);
    });

    process_fetching_squaremeters(olxScrapedData); // if i place 
 //data_functions.mergeData(olxScrapedData); here it will work
  });
};

//Part where raw html data is fetched but in div that we want.
const fetch_raw_html = ($) => {
  return $('div[id="rezultatipretrage"] > div')
    .not('div[class="listitem artikal obicniArtikal  i index"]')
    .not('div[class="obicniArtikal"]');
};

//Here is all logic for getting data that we want, from the raw html.
const process_single_article = ($, index, element, olxScrapedData) => {
  $('span[class="prekrizenacijena"]').remove();
  const getLink = $(element).find('div[class="naslov"] > a').attr('href');
  const getDescription = $(element).find('div[class="naslov"] > a > p').text();
  const getPrice = $(element)
    .find('div[class="datum"] > span')
    .text()
    .replace(/\.| ?KM$/g, '')
    .replace(' ', '');
  const getPicture = $(element).find('div[class="slika"] > img').attr('src');
  //making array of objects with data that is scraped.
  olxScrapedData[index] = {
    id: getLink.substring(27, 35),
    link: getLink,
    description: getDescription,
    price: parseFloat(getPrice),
    picture: getPicture,
  };
};

//Square meters are needed to be fetched for every single article.
//This function loads up all links in the olxScrapedData array, and updating objects with square meters value for each apartment.
const process_fetching_squaremeters = (olxScrapedData) => {
  const fetchSquaremeters = Promise.all(
    olxScrapedData.map((item) => {
      return axios.get(item.link).then((response) => {
        const $ = cheerio.load(response.data);
        const getSquaremeters = $('div[class="df2  "]')
          .first()
          .text()
          .replace('m2', '')
          .replace(',', '.')
          .split('-')[0];
        item.squaremeters = Math.round(getSquaremeters);
        item.pricepersquaremeter = Math.round(
          parseFloat(item.price) / parseFloat(getSquaremeters)
        );
      });
    })
  );

  fetchSquaremeters.then(() => {
    data_functions.mergeData(olxScrapedData); //Sending final array to mergeData function.
    return olxScrapedData;
  });
};

Now if I console.log(olxScrapedData) in the fetchSquaremeters.then it will output scraped apartments, but it doesn't want to call the function data_functions.mergeData(olxScrapedData). But if I add that block in the load_url, it will trigger the functions and data is being merged, but without square meters things, and I really need that data.

So my question is, how to make this work? Do I need to call function somewhere else or?

What I want is just that this last olxScrapedData be sent to this function mergeData so that my arrays from different scrapers would be merged into one.

Thanks!

Edit: also here is the other scrapers how it looks: https://jsfiddle.net/oh03mp8t/. Note that in this scraper there is no any promises.

Upvotes: 0

Views: 77

Answers (2)

pupeeterMaster
pupeeterMaster

Reputation: 81

Try adding this: const process_fetching_squaremeters = async (olxScrapedData) ... and then await fetchSquaremeters.then(..).

James, in answer before told you what is happening. You must wait for this promise to be resolved, in order to all be executed correctly. If you don't have experience with async/await, promises, I suggest you watch some courses on them, to really understand what is happening here

Upvotes: 1

James McGuigan
James McGuigan

Reputation: 8106

Are you missing return/await statements from inside your promise/async statements, especially when your last statement is also a promise?

Without that, you may be simply asking the promise to be executed at a later time, rather than returning the result and making $.all() wait for it.

Upvotes: 0

Related Questions