Reputation: 177
I am querying mssql database table with above 15 million rows and processing all of the returned data using node. Well with a basic SQL query, right away that seems to be a problem since the amount of data to write in memory will crash nodejs.
Alternatively, I gave node more memory with --max-old-space-size=8000 but that seems to not help too. I have also tried using node js streams, even though I am not sure I am wrong streams will be useful if I am processing the data by chunk as opposed to the entire data.
Considering I am sorting, grouping, and mapping through the entire data, What would be the best way to both make querying 15 million rows faster and efficiently use the memory to process the data?
Currently, all the examples I have seen suggest streaming, but then all streaming examples process data by chunk. It would be very helpful if you could couple a simple example with your suggestions.
Possible Ideas
Thanks
Upvotes: 0
Views: 1762
Reputation: 865
In documentation npm-mssql
There is a really nice explanation about streaming without crashing. You should intermittently pause your request.
When streaming large sets of data you want to back-off or chunk the amount of data you're processing to prevent memory exhaustion issues; you can use the Request.pause() function to do this
Look for example of managing rows in batches of 15
let rowsToProcess = [];
request.on('row', row => {
rowsToProcess.push(row);
if (rowsToProcess.length >= 15) {
request.pause();
processRows();
}
});
request.on('done', () => {
processRows();
});
function processRows() {
// process rows
rowsToProcess = [];
request.resume();
}
Upvotes: 2