Reputation: 310
I'm working on a project that involves a large JSON file, basically a multidimensional array dumped in JSON form, but the overall size would be larger than the amount of memory I have. If I load it in as a string and then parse the string, that will consume all of the memory.
Are there any methods to limit the memory consumption, such as only retrieving data between specific indices? Could I implement that using solely the Nlohmann json library/the standard libraries?
Upvotes: 0
Views: 1275
Reputation: 1688
Using DAW JSON Link, https://github.com/beached/daw_json_link , you can create an iterator pair/range and iterate over the JSON array 1 record at a time. The library also has routines for working with JSONL, which is common in large datasets.
For opening the file, I would use something like mmap/virtual alloc to handle that for us. The examples in the library use this via the daw::filesystem::memory_mapped_file_t
type that abstracts the file mapping.
With that, the memory mapped file allows the OS to page the data in/out as needed, and the iterator like interface keeps the memory requirement to that of one array element at a time.
The following demonstrates this, using a simple Record that
struct Point {
int x;
int y;
};
The program to do this looks like
#include <cassert>
#include <daw/daw_memory_mapped_file.h>
#include <daw/json/daw_json_iterator.h>
#include <daw/json/daw_json_link.h>
#include <iostream>
struct Point {
double x;
double y;
};
namespace daw::json {
template<>
struct json_data_contract<Point> {
using type =
json_member_list<json_number<"x">, json_number<"y">>;
};
}
int main( int argc, char** argv ) {
assert( argc >= 1 );
auto json_doc = daw::filesystem::memory_mapped_file_t<char>( argv[1] );
assert( json_doc.size( ) > 2 );
auto json_range = daw::json::json_array_range<Point>( json_doc );
auto sum_x = 0.0;
auto sum_y = 0.0;
auto count = 0ULL;
for( Point p: json_range ) {
sum_x += p.x;
sum_y += p.y;
++count;
}
sum_x /= static_cast<double>( count );
sum_y /= static_cast<double>( count );
std::cout << "Centre Point (" << sum_x << ", " << sum_y << ")\n";
}
https://jsonlink.godbolt.org/z/xoxEd1z6G
Upvotes: 1
Reputation: 26
Could you please specify the context of your question
Computing large JSON files can consume a lot of memory resources on a server, perhaps, make your app to crash. I have experienced first-hand, that manipulating large JSON files on my local computer with 8 GB of memory RAM is not a problem using a NodeJS script to compute the large JSON files payloads. However, trying to run those large JSON payloads in an application running on a server give me problems too.
I hope this helps.
Upvotes: 0
Reputation: 249093
RapidJSON and others can do it. Here's an example program using RapidJSON's "SAX" (streaming) API: https://github.com/Tencent/rapidjson/blob/master/example/simplereader/simplereader.cpp
This way, you'll get an event (callback) for each element encountered during parsing. The memory consumption of the parsing itself will be quite small.
Upvotes: 2