Comma separator between JSON objects with json.dump

Question

I am fiddling around with outputting a json file with some attributes of the files within a directory. My problem is, when appending to the file there is no separator between each object. I could just add a comma after each 'f' and delete the last one, but that seems like a sloppy work around to me.

import os
import os.path
import json

#Create and open file_data.txt and append 
with open('file_data.txt', 'a') as outfile:

    files = os.listdir(os.curdir)


    for f in files:

        extension = os.path.splitext(f)[1][1:]
        base = os.path.splitext(f)[0]
        name = f

        data = {
            "file_name" : name,
            "extension" : extension,
            "base_name" : base
                }

        json.dump(data, outfile)

This outputs:

{"file_name": "contributors.txt", "base_name": "contributors", "extension": "txt"}{"file_name": "read_files.py", "base_name": "read_files", "extension": "py"}{"file_name": "file_data.txt", "base_name": "file_data", "extension": "txt"}{"file_name": ".git", "base_name": ".git", "extension": ""}

What I would like is actual JSON:

{"file_name": "contributors.txt", "base_name": "contributors", "extension": "txt"},{"file_name": "read_files.py", "base_name": "read_files", "extension": "py"},{"file_name": "file_data.txt", "base_name": "file_data", "extension": "txt"}{"file_name": ".git", "base_name": ".git", "extension": ""}

abarnert · Accepted Answer

What you're getting is not a JSON object, but a stream of separate JSON objects.

What you would like is still not a JSON object, but a stream of separate JSON objects with commas between them. That's not going to be any more parseable.*

_{* The JSON spec is simple enough to parse by hand, and it should be pretty clear that an object followed by another object with a comma in between doesn't match any valid production.}

If you're trying to create a JSON array, you can do that. The obvious way, unless there are memory issues, is to build a list of dicts, then dump that all at once:

output = []
for f in files:
    # ...
    output.append(data)
json.dump(output, outfile)

If memory is an issue, you have a few choices:

For a quick-and-dirty solution, you can fake it by writing the [, ,, and ] manually. (But note that it is not valid JSON to have an extra trailing comma after the last value, even if some decoders will accept it.)
You can wrap your loop up in a generator function that yields each data, and extend JSONEncoder to convert iterators to arrays. (Note that this is actually used as the example in the docs of why and how to extend JSONEncoder, although you might want to write a more memory-efficient implementation.)
You can look for a third-party JSON library that has some kind of built-in iterative streaming API.

However, it's worth considering what you're trying to do. Maybe a stream of separate JSON objects actually is the right file format/protocol/API for what you're trying to do. Because JSON is self-delimiting, there's really no reason to add a delimiter between separate values. (And it doesn't even help much with robustness, unless you use a delimiter that isn't going to show up all over the actual JSON, as , is.) For example, what you've got is exactly what JSON-RPC is supposed to look like. If you're just asking for something different because you don't know how to parse such a file, that's pretty easy. For example (using a string rather than a file for simplicity):

i = 0
d = json.JSONDecoder()
while True:
    try:
        obj, i = d.raw_decode(s, i)
    except ValueError:
        return
    yield obj

Comma separator between JSON objects with json.dump

Answers (2)

Related Questions