Reputation: 1255
I'm trying to get the data from a huge file (800k rows) and put it into database via lambda (AWS). To do that I'm getting the xlsx file from S3 as a buffer and reading it.
module.exports.getSalesData = new Promise((resolve, reject) => {
getFileFromS3(filename)
.then(function (workbook) {
console.log(workbook.SheetNames[1]); // 'sales'
console.log(workbook.SheetNames); // showing sales as [ 'main', 'sales', 'Sheet1' ]
console.log(Array.isArray(workbook.SheetNames)); // true
console.log(typeof workbook.SheetNames); // Object
console.log(Object.keys(workbook.Sheets)); // [ 'main', 'Sheet1' ] == why 'sales' is not here?
var sheet_name = workbook.SheetNames[1]; // sales tab
var json_sheet = XLSX.utils.sheet_to_json(workbook.Sheets[sheet_name], { raw: true })
resolve(json_sheet)
})
.catch(err => {
console.log('File: ' + filename + ' doesn\'t exists on S3 or you\'re not connected to the internet.');
})
})
The issue is that for workbook.Sheets
I should see [ 'main', 'sales', 'Sheet1' ]
, right?
Then I try to get the number of rows (already converted to JSON) like this:
getSalesData.then(function (data) {
console.log(data.length + ' rows');
console.log(data[0]);
}).catch(err => console.error(err));
Where the parameter data
is the json_sheet
defined in the function above.
So for data.length
(number of rows) I get 0 instead of 800k+.
And, of course, I'm unable to get data[0]
which is undefined
.
PS.: the file has 57.3mb -- Not sure if it's the cause.
Thanks in advance for help.
Upvotes: 0
Views: 1243
Reputation: 1255
So basically what was happening is that NodeJS wasn't able to read the full file because it crashes the NodeJS VM memory limit for strings.
So what I had to do is to increase the memory limit like this:
node --max-old-space-size=2048 services/process/process-sales.js
Which will increase from 512MB to 2048MB / 2GB of memory for NodeJS.
But this is just a solution to read large amount of values.
I don't recommend using NodeJS to threat large amount of data like this. Instead go with Python using some library like Pandas which is awesome to do it.
PS.: Just my opinion and experience by dealing with data using nodejs. I don't think nodejs was made for it.
Upvotes: 1