SachinB
SachinB

Reputation: 348

How can we iterate over a JSON file of size around 2 gb using Java

I am reading 3 column values from the database (around 50 000 records) and then trying to search this value in Json file. Json file is containing 2 million Json objects. I have tried the below approaches.

Approach 1.

JSONArray json = readJson(Constants.jsonFilePath);

private JSONArray readJson(String jsonFilePath) {
    String content = null;
    File file = new File(Constants.jsonFilePath);
    try {
        content = FileUtils.readFileToString(file, "utf-8");
    } catch (IOException e) {
        e.printStackTrace();
    }
    return new JSONArray(content);

}

And then linearly searching the required field value

I tested the above code against the file of size 150 MB and it worked very well. But when I was testing it for the file of size 2 gb then I am getting OutOfHeapMemory error.

Approach 2:

Then I tried to read 100 000 Json objects at a time from the file and then check the required field value but the process is tremendously slow.

I am using org.json library. Is there any better way to solve the above problem?

Upvotes: 3

Views: 1873

Answers (3)

Trần Nam Trung
Trần Nam Trung

Reputation: 113

Have you tried to create your own JSON parser (for specific JSON obj)? Since you've known the JSON format in this case. Then just linearly parse single obj (you can use readLine() until a '}' is closed for the first open '{' ) and compare to the search values. :D You can also reduce the time with multithread approach.

This is just an idea and I still haven't known exactly what your JSON file looks like.

Upvotes: 0

Yogesh_D
Yogesh_D

Reputation: 18764

You should be using a streaming JSON parser rather than reading the whole file. This will be slow but will be manageable. Look at the Jackson Streaming API to see how to achieve this.

This does mean that you will have to handle low level processing of the JSON objects but should be faster than loading all of the JSON in memory.

Here is a link to using the Streaming API.

Note that GSON also has a similar streaming API.

Upvotes: 2

jwenting
jwenting

Reputation: 5663

Of course it's going to be slow, it's a massive amount of data. Splitting it up into more manageable chunks is the only thing you can do, and you'll have to take the performance hit as a cost of doing business, as it simply won't fit in memory.

Of course you can tell the JVM to claim 4GB of RAM and hope it'll be enough, but it's still going to take quite a bit of time to process that amount of data.

Which leaves the question why you're trying to handle such a large single JSON object at all, there are far better ways to store bulk data than that that are far less CPU and RAM intensive to process. Databases come to mind, nicely searchable using SQL or similar query languages.

At this point you're running not just into the limits of what can be sensibly expected from your JVM but your operating system itself.

Upvotes: 4

Related Questions