Reputation: 613
Assume there is a shop with 500 products, each with an ID starting from 0 to 500, each having its data stored in a JSON file living under a URL (e.g myshop.com/1.json
, ...2.json
etc).
Using a Node.js script, I would like to download all of these JSON files and store them locally. I can do it consecutively:
const totalProductsCount = 500;
try {
let currentItem = 1;
while (currentItem < (totalProductsCount + 1)) {
const product = await axios.get(`https://myshop.com/${currentItem}.json`);
fs.writeFileSync(`./product-${currentItem}.json`, JSON.stringify(product.data, null, 2));
currentItem++;
}
} catch (e) {
return;
}
Which works. However, I'd like to download these files fast, really fast. So I am trying to split all of my requests into groups, and get these groups in parallel. What I have is the following:
const _ = require('lodash');
const fs = require('fs');
const axios = require('axios');
const getChunk = async (chunk, index) => {
// The counter here is used for logging purposes only
let currentItem = 1;
try {
// Iterate through the items 1-50
await chunk.reduce(async (promise, productId) => {
await promise;
const product = await axios.get(`https://myshop.com/${productId}`);
if (product && product.data) {
console.log('Got product', currentItem, 'from chunk', index);
fs.writeFileSync(`./product-${productId}.json`, JSON.stringify(product.data, null, 2));
}
currentItem++;
}, Promise.resolve());
} catch (e) {
throw e;
}
}
const getProducts = async () => {
const totalProductsCount = 500;
// Create an array of 500 elements => [1, 2, 3, 4, ..., 499, 500]
const productIds = Array.from({ length: totalProductsCount }, (_, i) => i + 1);
// Using lodash, I am chunking that array into 10 groups of 50 each
const chunkBy = Math.ceil(productIds.length / 10);
const chunked = _.chunk(productIds, chunkBy);
// Run the `getChunkProducts` on each of the chunks in parallel
const products = await Promise.all([
...chunked.map((chunk, index) => getChunk(chunk, index))
])
// If the items are to be returned here, it should be with a single-level array
return _.flatten(products);
};
(async () => {
const products = await getProducts();
})()
This seems to be working most of the time, especially when I use on a smaller number of items. However, there is a behaviour which I cannot explain, where the script hangs when I ask for larger quantities of items.
What would be the best way to achieve this/best-practice and being able to catch any files that hang or that may not have been downloaded (since my thought is, I can download whatever I can with the chunking-action, then get back an array of all products ids which failed to download, and download them using the first method consecutively).
Upvotes: 1
Views: 885
Reputation: 5289
You are writing files synchronously in the middle of an async action! Change writeFileSync to use the async version. This should be an immediate improvement. As an additional performance enhancement you would ideally use a code path that does not parse the response if you want the results directly written into a file. It looks like you can use responseType: 'stream' in your request config to accomplish this. This would prevent the overhead of parsing the response into a JS object before writing it to the file.
It also sounds like you may also want to adjust the timeout on your http requests to be at a lower level to determine if it should fail after a few seconds instead of waiting for a request you think should fail. If you refer to the docs there is a param on the request config that you could lower to a few seconds. https://axios-http.com/docs/req_config
Upvotes: 2