TheHeroBrine
TheHeroBrine

Reputation: 17

Weirdness when modifying massive json file/array in Node.js

I am working with a massive json file (almost 60 MB) that i am trying to remove all entries in it where volume = 0. The format of the array is

{
  "date": 1424373000,
  "high": 0.33,
  "low": 225,
  "open": 0.33,
  "close": 225,
  "volume": 0.999999,
  "quoteVolume": 0.00444444,
  "weightedAverage": 225
}

To do this I am using this code.

fs.readFile('JSONFiles/poloniexBTCDataFeb19|2015-July2|2018.json', function read(err, data) {
  if (err) {
    throw err;
  }
  rawdata = JSON.parse(data);
  rawdata.forEach(function(val, index, array) {
    if (rawdata[index].volume == 0) {
      rawdata.splice(index, 1)
    }
  })
});

The problem with this is that it only deletes about half of the entries with this characteristic (60k/108k). The way I fixed this was to use a for loop which runs the code 9 times, which deletes them all, but this causes the code to take significantly longer cause the whole json file has about 360k entries and it has to check each one with that if statement. I was wondering if there was any way to do this where it actually deletes them all without having to use a for loop in this manner?

EDIT: I have realized that I do not need this code in the first place so nevermind but thanks for all the answers. I hope this helps someone else when they hit a simmilar issue.

Upvotes: 1

Views: 51

Answers (3)

RaR
RaR

Reputation: 3213

The problem is, you are mutating the array rawdata. Let's take an example array [e1, e2, e3, e4] and the code,

var arr = ['e1', 'e2', 'e3', 'e4']

arr.forEach(function(elem, idx){
  console.log('checking elem', elem);
  if (elem === 'e2'){
    arr.splice(idx, 1)
  }
});

console.log('\nAfter iteration', arr);

As you can see, I am removing e2 when I encounter it. This affects the actual array and the element which is getting replaced for it, will not be inspected (as forEach iteration already visited element at that index). In the above code, e3 did not get checked. So, it is advised to not mutate array in forEach iteration.

You can do like,

rawdata.slice().forEach(function (val, index, array) {
   if (rawdata[index].volume == 0) {
      rawdata.splice(index, 1)
   }
});

Here slice() will make a new array and mutating your original rawdata will not affect the iteration.

Upvotes: 2

6502
6502

Reputation: 114481

Your code is buggy and the error is quite common (iterating over an array while mutating it). The code is also very inefficient because for each element to delete will move all other elements by one place (the fact that you use splice doesn't mean the a loop is not done... there is still a loop behind the scenes to implement that function).

If you need to remove elements from an array in-place (i.e. you don't want to get a copy) a simple approach is using what I normally call a read-skip-write loop:

let wp = 0; // the "write pointer"
for (let x of data) {
    if (keep(x)) data[wp++] = x;
}
data.length = wp; // trim unused space

PS: as a side note try to change your mindset about programming. If your first thought is that node is buggy then you're not going to get very far into coding. The reality is that bug is 99.99% of the times in your code... looking somewhere else won't make you a better programmer.

Upvotes: 0

Dushyant Bangal
Dushyant Bangal

Reputation: 6403

You are splicing the records, that might be taking time.
Instead of the forEach, try this:

var filteredData = rawdata.filter(function (val) {
    return val.volume != 0
})

Upvotes: 2

Related Questions