Reputation: 386
I am fiddling around with outputting a json file with some attributes of the files within a directory. My problem is, when appending to the file there is no separator between each object. I could just add a comma after each 'f' and delete the last one, but that seems like a sloppy work around to me.
import os
import os.path
import json
#Create and open file_data.txt and append
with open('file_data.txt', 'a') as outfile:
files = os.listdir(os.curdir)
for f in files:
extension = os.path.splitext(f)[1][1:]
base = os.path.splitext(f)[0]
name = f
data = {
"file_name" : name,
"extension" : extension,
"base_name" : base
}
json.dump(data, outfile)
This outputs:
{"file_name": "contributors.txt", "base_name": "contributors", "extension": "txt"}{"file_name": "read_files.py", "base_name": "read_files", "extension": "py"}{"file_name": "file_data.txt", "base_name": "file_data", "extension": "txt"}{"file_name": ".git", "base_name": ".git", "extension": ""}
What I would like is actual JSON:
{"file_name": "contributors.txt", "base_name": "contributors", "extension": "txt"},{"file_name": "read_files.py", "base_name": "read_files", "extension": "py"},{"file_name": "file_data.txt", "base_name": "file_data", "extension": "txt"}{"file_name": ".git", "base_name": ".git", "extension": ""}
Upvotes: 12
Views: 31257
Reputation: 376
I had the same problem because I needed to yield
objects into a file because I didn't want to load the whole list of objects into memory.
Here's my approach (but I think it's a bit hacky though):
json_begin = '{"objects":['
json_end = ']}'
with open('file_data.txt', 'a') as outfile:
files = os.listdir(os.curdir)
outfile.write(json_begin)
for f in files:
extension = os.path.splitext(f)[1][1:]
base = os.path.splitext(f)[0]
name = f
data = {
"file_name" : name,
"extension" : extension,
"base_name" : base
}
json.dump(data, outfile)
if f != files[-1]:
outfile.write(',')
outfile.write(json_end)
Upvotes: 3
Reputation: 365657
What you're getting is not a JSON object, but a stream of separate JSON objects.
What you would like is still not a JSON object, but a stream of separate JSON objects with commas between them. That's not going to be any more parseable.*
* The JSON spec is simple enough to parse by hand, and it should be pretty clear that an object followed by another object with a comma in between doesn't match any valid production.
If you're trying to create a JSON array, you can do that. The obvious way, unless there are memory issues, is to build a list of dicts, then dump that all at once:
output = []
for f in files:
# ...
output.append(data)
json.dump(output, outfile)
If memory is an issue, you have a few choices:
[
, ,
, and ]
manually. (But note that it is not valid JSON to have an extra trailing comma after the last value, even if some decoders will accept it.)data
, and extend JSONEncoder
to convert iterators to arrays. (Note that this is actually used as the example in the docs of why and how to extend JSONEncoder
, although you might want to write a more memory-efficient implementation.)However, it's worth considering what you're trying to do. Maybe a stream of separate JSON objects actually is the right file format/protocol/API for what you're trying to do. Because JSON is self-delimiting, there's really no reason to add a delimiter between separate values. (And it doesn't even help much with robustness, unless you use a delimiter that isn't going to show up all over the actual JSON, as ,
is.) For example, what you've got is exactly what JSON-RPC is supposed to look like. If you're just asking for something different because you don't know how to parse such a file, that's pretty easy. For example (using a string rather than a file for simplicity):
i = 0
d = json.JSONDecoder()
while True:
try:
obj, i = d.raw_decode(s, i)
except ValueError:
return
yield obj
Upvotes: 19