Reputation: 105
I have multiple files, each containing multiple highly nested json rows. The two first rows of one such file look like:
{
"u":"28",
"evv":{
"w":{
"1":400,
"2":{
"i":[{
"l":14,
"c":"7",
"p":"4"
}
]
}
}
}
}
{
"u":"29",
"evv":{
"w":{
"3":400,
"2":{
"i":[{
"c":14,
"y":"7",
"z":"4"
}
]
}
}
}
}
they are actually rows, I just wrote them here this way for more visibility.
My question is the following:
Is there any way to convert all these files to one (or multiple, i.e. one per file) csv/excel... ?
Is there any simple way, that doesn't require writing dozens, or hundreds of lines in Python, specific to my file, to convert all these files to one (or multiple, i.e. one per file) csv/excel... ? One example would be using an external library, script... that handles this particular task, regardless of the names of the fields.
The trap is that some elements do not appear in each line. For example, for the "i" key, we have 3 fields (l, c, p) in the first json, and 3 in the second one (c, y, z). Ideally, the csv should contain as many columns as possible fields (e.g. evv.w.2.i.l, evv.w.2.i.c, evv.w.2.i.p, evv.w.2.i.y, evv.w.2.i.z) at the risk of having (many) null values per csv row.
A possible csv output for this example would have the following columns:
u, evv.w.1, evv.w.3, evv.w.2.i.l, evv.w.2.i.c, evv.w.2.i.p, evv.w.2.i.y, evv.w.2.i.z
Any idea/reference is welcome :)
Thanks
Upvotes: 1
Views: 1047
Reputation: 168776
No, there is no general-purpose program that does precisely what you ask for.
You can, however, write a Python program that does it.
This program might do what you want. It does not have any code specific to your key names, but it is specific to your file format.
import fileinput
import json
import csv
def flattify(d, key=()):
if isinstance(d, list):
result = {}
for i in d:
result.update(flattify(i, key))
return result
if isinstance(d, dict):
result = {}
for k, v in d.items():
result.update(flattify(v, key + (k,)))
return result
return {key: d}
total = []
for line in fileinput.input():
if(line.strip()):
line = json.loads(line)
line = flattify(line)
line = {'.'.join(k): v for k, v in line.items()}
total.append(line)
keys = set()
for d in total:
keys.update(d)
with open('result.csv', 'w') as output_file:
output_file = csv.DictWriter(output_file, sorted(keys))
output_file.writeheader()
output_file.writerows(total)
Upvotes: 1
Reputation: 26642
Please check if this (python3) solution works for you.
import json
import csv
with open('test.json') as data_file:
with open('output.csv', 'w', newline='') as fp:
for line in data_file:
data = json.loads(line)
output = [[data['u'], data['evv']['w'].get('1'), data['evv']['w'].get('3'),
data['evv']['w'].get('2')['i'][0].get('l'), data['evv']['w'].get('2')['i'][0].get('c'),
data['evv']['w'].get('2')['i'][0].get('p'), data['evv']['w'].get('2')['i'][0].get('y'),
data['evv']['w'].get('2')['i'][0].get('z')]]
a = csv.writer(fp, delimiter=',')
a.writerows(output)
test.json
{ "u": "28", "evv": { "w": { "1": 400, "2": { "i": [{ "l": 14, "c": "7", "p": "4" }] } } }}
{"u":"29","evv":{ "w":{ "3":400, "2":{ "i":[{ "c":14, "y":"7", "z":"4" } ] } } }}
output
python3 pyprog.py
dac@dac-Latitude-E7450 ~/P/pyprog> more output.csv
28,400,,14,7,4,,
29,,400,,14,,7,4
Upvotes: 1