Reputation: 2065
I'm trying to write many JSON files to a CSV file. Each JSON file has several keys, but different files have different keys. Here are three JSON files as an example.
file A:
{"a": 1, "c": 2}
file B:
{"b": 5, "d": 3}
file C:
{"a": 6, "b": 7}
I'd like one CSV file like this with four columns and three rows (commas omitted for simplicity):
a b c d
1 2
5 3
6 7
One way to do this is by multiple try/except statements using csv writer. But that becomes infeasible as I am dealing with a large number of keys. Are there any alternatives?
Upvotes: 1
Views: 2928
Reputation: 21453
Assuming you know all the possible field names ahead of time csv.DictWriter
already comes with a solution for this, use the restval
argument to the constructor:
If the row read has fewer fields than the fieldnames sequence, the remaining keys take the value of the optional
restval
parameter.
so specifying csv.DictWriter(..., restval=" ")
would replace any missing values with a single space although by default restval
is set to ""
(an empty string) which will probably be more useful to you anyway.
so basically your code would look like this:
import csv, json
all_fields = ["a","b","c","d"]
all_files = ["A.json","B.json","C.json"]
with open("OUTPUT.csv", "w") as output_file:
writer = csv.DictWriter(output_file,all_fields)
writer.writeheader()
for filename in all_files:
with open(filename,"r") as in_file:
writer.writerow(json.load(in_file))
Upvotes: 4
Reputation: 11942
This works :
csv_separator = ';'
data = [{"a": 1, "c": 2},
{"b": 5, "d": 3},
{"a": 6, "b": 7}]
headers = sorted(list(set(sum([list(l.keys()) for l in data], []))))
with open('output.csv', 'w+') as f:
f.write(csv_separator.join(headers))
for l in data:
line_elements = []
for k in headers:
try:
line_elements.append(str(l[k]))
except: # key not in dict, append empty string, i'll let you catch the exception properly
line_elements.append('')
f.write(csv_separator.join(line_elements))
# Output :
# a;b;c;d
# 1;;2;
# ;5;;3
# 6;7;;
Upvotes: 0
Reputation: 109636
You can append each JSON file to a list and then create dataframes and concatenate.
a = {"a": 1, "c": 2}
b = {"b": 5, "d": 3}
c = {"a": 6, "b": 7}
data = [a, b, c]
>>> pd.concat([pd.DataFrame(s, index=[0]) for s in data]).reset_index()
a b c d
0 1 NaN 2 NaN
1 NaN 5 NaN 3
2 6 7 NaN NaN
Upvotes: 2
Reputation: 3724
You could load each individual dictionary with the missing keys and give them null values. So it might look like this
for items in list:
for x in ['a','b','c','d']:
if x not in item:
item[x] = ""
Now that each dictionary has the same keys, you should be able to write the csv easily in the format you want.
Upvotes: 2