Reputation: 57
currently working with jsonl files and I intend to convert it into CSV format to run it through a program. However, I realize that it would be better to convert it from json directly to CSV instead, and I wrote a code below to convert json to csv. However, I am unsure on how I can convert my current jsonl files into the desired json format before I can run this code. If anyone has any solutions for me, do let me know! Thanks so much for the read and appreciate all the help that I can get.
(FYI I tried to convert the jsonl file directly using the json to csv converter below and I get an error message below:)
Converting to CSV: XXX.jsonl
ERROR: Extra data
This is the conversion code, I hope it helps!
from json.decoder import JSONDecodeError
import pandas as pd
import sys
from flatten_json import flatten
import json
def main():
if len(sys.argv) not in [1, 2]:
sys.exit("Usage: python JsonCon.py [FILENAME.json] \n exitted")
filename = sys.argv[1]
print(f"Converting to CSV: {filename}")
convertFile(filename)
def convertFile(filename):
try:
with open(filename) as file:
jsString = json.load(file)
dic_flat = [flatten(d) for d in jsString]
df = pd.DataFrame(dic_flat)
df.to_csv(f'{filename[:-5]}.csv')
except JSONDecodeError as e:
print(f'ERROR: {e.msg}')
if __name__ == "__main__":
main()
Upvotes: 0
Views: 5001
Reputation: 4368
import json
import csv
import io
# get the JSON objects from JSONL
jsonl_data = """{"a": 1, "b": 123}\n{"a": 2, "b": 234}\n{"a": 3, "b": 345}\n"""
json_lines = tuple(json_line
for json_line in jsonl_data.splitlines()
if json_line.strip())
jsons_objs = tuple(json.loads(json_line)
for json_line in json_lines)
# write them into a CSV file
fake_file = io.StringIO()
writer = csv.writer(fake_file)
writer.writerow(["a", "b"])
writer.writerows((value for key, value in sorted(json_obj.items()))
for json_obj in jsons_objs)
print(fake_file.getvalue())
Upvotes: 1