Reputation: 17
I am working with a massive json file (almost 60 MB) that i am trying to remove all entries in it where volume = 0. The format of the array is
{
"date": 1424373000,
"high": 0.33,
"low": 225,
"open": 0.33,
"close": 225,
"volume": 0.999999,
"quoteVolume": 0.00444444,
"weightedAverage": 225
}
To do this I am using this code.
fs.readFile('JSONFiles/poloniexBTCDataFeb19|2015-July2|2018.json', function read(err, data) {
if (err) {
throw err;
}
rawdata = JSON.parse(data);
rawdata.forEach(function(val, index, array) {
if (rawdata[index].volume == 0) {
rawdata.splice(index, 1)
}
})
});
The problem with this is that it only deletes about half of the entries with this characteristic (60k/108k). The way I fixed this was to use a for loop which runs the code 9 times, which deletes them all, but this causes the code to take significantly longer cause the whole json file has about 360k entries and it has to check each one with that if statement. I was wondering if there was any way to do this where it actually deletes them all without having to use a for loop in this manner?
EDIT: I have realized that I do not need this code in the first place so nevermind but thanks for all the answers. I hope this helps someone else when they hit a simmilar issue.
Upvotes: 1
Views: 51
Reputation: 3213
The problem is, you are mutating the array rawdata
. Let's take an example array [e1, e2, e3, e4] and the code,
var arr = ['e1', 'e2', 'e3', 'e4']
arr.forEach(function(elem, idx){
console.log('checking elem', elem);
if (elem === 'e2'){
arr.splice(idx, 1)
}
});
console.log('\nAfter iteration', arr);
As you can see, I am removing e2
when I encounter it. This affects the actual array and the element which is getting replaced for it, will not be inspected (as forEach iteration already visited element at that index). In the above code, e3
did not get checked. So, it is advised to not mutate array in forEach iteration.
You can do like,
rawdata.slice().forEach(function (val, index, array) {
if (rawdata[index].volume == 0) {
rawdata.splice(index, 1)
}
});
Here slice()
will make a new array and mutating your original rawdata
will not affect the iteration.
Upvotes: 2
Reputation: 114481
Your code is buggy and the error is quite common (iterating over an array while mutating it). The code is also very inefficient because for each element to delete will move all other elements by one place (the fact that you use splice
doesn't mean the a loop is not done... there is still a loop behind the scenes to implement that function).
If you need to remove elements from an array in-place (i.e. you don't want to get a copy) a simple approach is using what I normally call a read-skip-write loop:
let wp = 0; // the "write pointer"
for (let x of data) {
if (keep(x)) data[wp++] = x;
}
data.length = wp; // trim unused space
PS: as a side note try to change your mindset about programming. If your first thought is that node
is buggy then you're not going to get very far into coding. The reality is that bug is 99.99% of the times in your code... looking somewhere else won't make you a better programmer.
Upvotes: 0
Reputation: 6403
You are splicing the records, that might be taking time.
Instead of the forEach
, try this:
var filteredData = rawdata.filter(function (val) {
return val.volume != 0
})
Upvotes: 2