Reputation: 83
So the basic idea was to write a method that will scrap webpage to get JSON data that contains rating of a product. Then call this method multipletimes over few domains (.de, .uk, .fr, .nl etc) to collect all ratings.
So I ended up with scrapWebPage
method which scraps single page:
const scrapWebPage = async (countryAppData, productNumber) => {
const shopUrl = `https://www.shopExample.${countryAppData.countryCode}/?q=${productNumber}`
const avoidCORSUrl = 'https://allorigins.me/get?url=' + shopUrl + '&callback=?'
return await axios
.get(avoidCORSUrl, {xmlMode: false, normalizeWhitespace: true})
.then(response => {
const $ = cheerio.load(response.data)
let scrapedWebPageJson
contentForParsing = $("script").get().children[0].data
scrapedWebPageJson = JSON.parse(contentForParsing)
return scrapedWebPageJson
})
}
scrapWebPage
also contains some parsing to return some JSON data I want - it resolves correctly (tested this) and returns Promise.
But then I'd like to call this method over multiple domains so I created getProductDataFromManyDomains
:
const getProductDataFromManyDomains = (productNum) => {
let prodData = {
reviews: []
}
const appCountries = [
{countryCode: 'nl'},
{countryCode: 'pl'},
{countryCode: 'de'}
]
appCountries.forEach(async countryApp => {
let countryData = {}
let parsedWebPage = await scrapWebPage(countryApp, productNum)
countryData.countryCode = countryApp.countryCode
countryData.ratingCount = parsedWebPage.aggregateRating.ratingCount
countryData.ratingValue = parsedWebPage.aggregateRating.ratingValue
countryData.reviews = parsedWebPage.reviews
prodData.reviews.push(countryData)
})
return prodData
}
And now I receive prodData
before populating... while I'd like to receive actual data (populated prodData
).
I'm not sure how I should construct this getProductDataFromManyDomains
method to actually return data and not prodData
before populating. Is that possible? Or what is good pattern here to deal with stuff like that?
Upvotes: 0
Views: 64
Reputation: 707376
Use a for
loop instead of .forEach()
. The for
loop will pause for await, the .forEach()
loop will not. This is because the async
callback you pass to .forEach()
will return a promise, but .forEach()
is not designed to do anything with that promise so it does not wait for it to resolve before continuing the loop, but a for
loop using await
does.
Then, getProductDataFromManyDomains()
will need to be async
and will return a promise with your final result.
async function getProductDataFromManyDomains(productNum) {
let prodData = {
reviews: []
}
const appCountries = [
{countryCode: 'nl'},
{countryCode: 'pl'},
{countryCode: 'de'}
]
for (let countryApp of appCountries) {
let countryData = {}
let parsedWebPage = await scrapWebPage(countryApp, productNum)
countryData.countryCode = countryApp.countryCode
countryData.ratingCount = parsedWebPage.aggregateRating.ratingCount
countryData.ratingValue = parsedWebPage.aggregateRating.ratingValue
countryData.reviews = parsedWebPage.reviews
prodData.reviews.push(countryData)
})
// this will be the resolved value of the promise that
// getProductDataFromManyDomains() returns
return prodData;
}
// usage
getProductDataFromManyDomains(productNum).then(result => {
console.log(result);
});
You could also run your multiple requests in parallel rather than one at a time, but since you originally attempted to make your code do them one at a time, I showed you how to do that.
If you wanted to do them in parallel, you would just accumulate the promises in an array and use Promise.all()
to know when they are all done and you would not await
the request.
Here's a version of the code that runs the requests in parallel, using .map()
and Promise.all()
:
function getProductDataFromManyDomains(productNum) {
let prodData = {
reviews: []
}
const appCountries = [
{countryCode: 'nl'},
{countryCode: 'pl'},
{countryCode: 'de'}
]
return Promise.all(appCounteries.map(countryApp => {
return scrapWebPage(countryApp, productNum).then(parsedWebPage => {
let countryData = {}
countryData.countryCode = countryApp.countryCode
countryData.ratingCount = parsedWebPage.aggregateRating.ratingCount
countryData.ratingValue = parsedWebPage.aggregateRating.ratingValue
countryData.reviews = parsedWebPage.reviews
return countryData;
});
})).then(results => {
// put results into prodData and make that the resolved value
prodData.reviews = results;
return prodData;
});
}
getProductDataFromManyDomains(productNum).then(result => {
console.log(result);
});
Upvotes: 2