VPfB
VPfB

Reputation: 17247

How to efficiently decode a large number of small JSON data chunks?

I'm going to write a parser for a log file where each line is one JSON record.

I could decode each line in a loop:

logs = [json.loads(line) for line in lines]

or I could decode the whole file in one go:

logs = json.loads('[' + ','.join(lines) + ']')

I want to minimize the execution time, please disregard other factors. Is there any reason to prefer one approach over the other?

Upvotes: 0

Views: 160

Answers (2)

niemmi
niemmi

Reputation: 17263

You can easily test it with timeit:

$ python -m timeit -s 'import json; lines = ["{\"foo\":\"bar\"}"] * 1000' '[json.loads(line) for line in lines]'
100 loops, best of 3: 2.22 msec per loop
$ python -m timeit -s 'import json; lines = ["{\"foo\":\"bar\"}"] * 1000' "json.loads('[' + ','.join(lines) + ']')"
1000 loops, best of 3: 839 usec per loop

In this case combining the data and parsing it one go is about 2.5 times faster.

Upvotes: 3

Raskayu
Raskayu

Reputation: 743

You can just make the log as a JSON dictionary. Like

{    
"log":{
     "line1":{...}
     "line2":{...}
    ...
    }
}

And then do this that explains how to convert and use JSON into dictionary in Python

So you can use it directly without parsing

Upvotes: 0

Related Questions