user62117
user62117

Reputation: 35

Loading and reading JSON file is very slow after writing in python

I'm trying to read in a JSON file, determine how many words are in the "text" field, add that information as a new field, "length", and write the new JSON object to a file. I've done that with the following code:

import json

with open("file_read.json", "r") as review_file, open(
    "file_write.json", "w") as review_write:
    for line in review_file:
        review_object = json.loads(line)
        review_object["length"] = len(review_object["text"].split())
        json.dump(review_object, review_write)

The original file is over 200mb, but I can view it alright with Vim; however, the file I write which is only 3mb larger takes a very long time to load if it loads at all. Furthermore, even if I read only the first JSON object, there are issues. I tried the following after writing the file:

with open("file_write", "r") as review_file:
    print review_file.readline()
    print("abcd123")

I'm using Vim with python-mode, and when I traverse the first printed statement with the JSON info it is very choppy, but the second statement is not.

Upvotes: 1

Views: 8045

Answers (1)

Salem
Salem

Reputation: 12986

The way you are writing your file, you will have only one HUGE line.

# example
json.dump([1,2,3], fp)
json.dump({"name": "abc"}, fp)
json.dump(33, fp)
# content of file
# [1, 2, 3]{"name": "abc"}33

This may explain why it is so slow to read: it will have to load ~200mb of text in one time. Also loading it as json will probably fail.

To solve it you can use instead:

fp.write(json.dumps(review_object) + "\n")

Upvotes: 5

Related Questions