ClementParis016
ClementParis016

Reputation: 509

How to write a JSON array to a file with Node.js writeStream?

I wrote a little Node.js script to scrape data from a website on which I'm iterating through pages to extract structured data.

The data I extract for each page is an a form of an array of objects.

I thought I could use fs.createWriteStream() method to create a writable stream on which I could write the data incrementally after each page extraction.

Apparently, you can only write a String or a Buffer to the stream, so I'm doing something like this:

output.write(JSON.stringify(operations, null, 2));

But in the end, once I close the stream, the JSON is malformatted because obvisously I just appended every array of each page one after the other, resulting in something looking like this:

[
    { ... },  /* data for page 1 */
    { ... }
][ /* => here is the problem */
    { ... },  /* data for page 2 */
    { ... }
]

How could I proceed to actually append the arrays into the output instead of chaining them? Is it even do-able?

Upvotes: 3

Views: 9641

Answers (1)

user1751825
user1751825

Reputation: 4309

Your options would be...

  1. Keep full array in memory and only write to the json file at the end, after processing all pages.
  2. Write each object individually, and handle the square brackets and commas manually.

Something like this...

//start processing
output.write('[');
//loop through your pages, however you're doing that
while (more_data_to_read()) {
    //create "operation" object
    var operation = get_operation_object();
    output.write(JSON.stringify(operation, null, 2));
    if (!is_last_page()) {
        //write out comma to separate operation objects within array
        output.write(',');
    }
}
//all done, close the json array
output.write(']');

This will create well-formed json.

Personally, I would opt for #1 though, as it seems the more 'correct' way to do it. If you're concerned about the array using too much memory, then json may not be the best choice for the data file. It's not particularly well suited to extremely large data-sets.

In the code sample above, if the process got interrupted partway through, then you'll have an invalid json file, so writing progressively won't actually make the application more fault-tolerant.

Upvotes: 9

Related Questions