Reputation: 509
I wrote a little Node.js script to scrape data from a website on which I'm iterating through pages to extract structured data.
The data I extract for each page is an a form of an array of objects.
I thought I could use fs.createWriteStream()
method to create a writable stream on which I could write the data incrementally after each page extraction.
Apparently, you can only write a String or a Buffer to the stream, so I'm doing something like this:
output.write(JSON.stringify(operations, null, 2));
But in the end, once I close the stream, the JSON is malformatted because obvisously I just appended every array of each page one after the other, resulting in something looking like this:
[
{ ... }, /* data for page 1 */
{ ... }
][ /* => here is the problem */
{ ... }, /* data for page 2 */
{ ... }
]
How could I proceed to actually append the arrays into the output instead of chaining them? Is it even do-able?
Upvotes: 3
Views: 9641
Reputation: 4309
Your options would be...
Something like this...
//start processing
output.write('[');
//loop through your pages, however you're doing that
while (more_data_to_read()) {
//create "operation" object
var operation = get_operation_object();
output.write(JSON.stringify(operation, null, 2));
if (!is_last_page()) {
//write out comma to separate operation objects within array
output.write(',');
}
}
//all done, close the json array
output.write(']');
This will create well-formed json.
Personally, I would opt for #1 though, as it seems the more 'correct' way to do it. If you're concerned about the array using too much memory, then json may not be the best choice for the data file. It's not particularly well suited to extremely large data-sets.
In the code sample above, if the process got interrupted partway through, then you'll have an invalid json file, so writing progressively won't actually make the application more fault-tolerant.
Upvotes: 9