wwl
wwl

Reputation: 2065

Convert JSON to CSV when file has different keys using Python?

I'm trying to write many JSON files to a CSV file. Each JSON file has several keys, but different files have different keys. Here are three JSON files as an example.

file A:

{"a": 1, "c": 2}

file B:

{"b": 5, "d": 3}

file C:

{"a": 6, "b": 7}

I'd like one CSV file like this with four columns and three rows (commas omitted for simplicity):

a b c d

1   2 

  5   3

6 7 

One way to do this is by multiple try/except statements using csv writer. But that becomes infeasible as I am dealing with a large number of keys. Are there any alternatives?

Upvotes: 1

Views: 2928

Answers (4)

Tadhg McDonald-Jensen
Tadhg McDonald-Jensen

Reputation: 21453

Assuming you know all the possible field names ahead of time csv.DictWriter already comes with a solution for this, use the restval argument to the constructor:

If the row read has fewer fields than the fieldnames sequence, the remaining keys take the value of the optional restval parameter.

so specifying csv.DictWriter(..., restval=" ") would replace any missing values with a single space although by default restval is set to "" (an empty string) which will probably be more useful to you anyway.

so basically your code would look like this:

import csv, json
all_fields = ["a","b","c","d"]
all_files = ["A.json","B.json","C.json"]

with open("OUTPUT.csv", "w") as output_file:
    writer = csv.DictWriter(output_file,all_fields)
    writer.writeheader()

    for filename in all_files:
        with open(filename,"r") as in_file:
            writer.writerow(json.load(in_file))

Upvotes: 4

Loïc
Loïc

Reputation: 11942

This works :

csv_separator = ';'

data = [{"a": 1, "c": 2},
{"b": 5, "d": 3},
{"a": 6, "b": 7}]

headers = sorted(list(set(sum([list(l.keys()) for l in data], []))))

with open('output.csv', 'w+') as f:
    f.write(csv_separator.join(headers))
    for l in data:
        line_elements = []
        for k in headers:
            try:
                line_elements.append(str(l[k]))
            except: # key not in dict, append empty string, i'll let you catch the exception properly
                line_elements.append('')
        f.write(csv_separator.join(line_elements))


# Output : 
# a;b;c;d
# 1;;2;
# ;5;;3
# 6;7;;

Upvotes: 0

Alexander
Alexander

Reputation: 109636

You can append each JSON file to a list and then create dataframes and concatenate.

a = {"a": 1, "c": 2}
b = {"b": 5, "d": 3}
c = {"a": 6, "b": 7}
data = [a, b, c]

>>> pd.concat([pd.DataFrame(s, index=[0]) for s in data]).reset_index()
    a   b   c   d
0   1 NaN   2 NaN
1 NaN   5 NaN   3
2   6   7 NaN NaN

Upvotes: 2

Liam Kelly
Liam Kelly

Reputation: 3724

You could load each individual dictionary with the missing keys and give them null values. So it might look like this

for items in list:
    for x in ['a','b','c','d']:
        if x not in item:
            item[x] = ""

Now that each dictionary has the same keys, you should be able to write the csv easily in the format you want.

Upvotes: 2

Related Questions