Ian
Ian

Reputation: 1467

Read and parse a >400MB .json file in Julia without crashing kernel

The following is crashing my Julia kernel. Is there a better way to read and parse a large (>400 MB) JSON file?

using JSON
data = JSON.parsefile("file.json") 

Upvotes: 10

Views: 1386

Answers (1)

Dan Getz
Dan Getz

Reputation: 18217

Unless some effort is invested into making a smarter JSON parser, the following might work: There is a good chance file.json has many lines. In this case, reading the file and parsing a big repetitive JSON section line-by-line or chunk-by-chuck (for the right chunk length) could do the trick. A possible way to code this, would be:

using JSON
f = open("file.json","r")

discard_lines = 12      # lines up to repetitive part
important_chunks = 1000 # number of data items
chunk_length = 2        # each data item has a 2-line JSON chunk

for i=1:discard_lines
    l = readline(f)
end
for i=1:important_chunks
    chunk = join([readline(f) for j=1:chunk_length])
    push!(thedata,JSON.parse(chunk))
end
close(f)
# use thedata

There is a good chance this could be a temporary stopgap solution for your problem. Inspect file.json to find out.

Upvotes: 4

Related Questions