honnielemons
honnielemons

Reputation: 57

Python: Converting JsonL to Json to CSV

currently working with jsonl files and I intend to convert it into CSV format to run it through a program. However, I realize that it would be better to convert it from json directly to CSV instead, and I wrote a code below to convert json to csv. However, I am unsure on how I can convert my current jsonl files into the desired json format before I can run this code. If anyone has any solutions for me, do let me know! Thanks so much for the read and appreciate all the help that I can get.

(FYI I tried to convert the jsonl file directly using the json to csv converter below and I get an error message below:)

Converting to CSV: XXX.jsonl
ERROR: Extra data

This is the conversion code, I hope it helps!

from json.decoder import JSONDecodeError
import pandas as pd
import sys
from flatten_json import flatten
import json

def main():
    if len(sys.argv) not in [1, 2]:
        sys.exit("Usage: python JsonCon.py [FILENAME.json] \n exitted")

    filename = sys.argv[1]
    print(f"Converting to CSV: {filename}")
    convertFile(filename)

def convertFile(filename):
    try:
        with open(filename) as file:
            jsString = json.load(file)
            dic_flat = [flatten(d) for d in jsString]
            df = pd.DataFrame(dic_flat)
            df.to_csv(f'{filename[:-5]}.csv')
    except JSONDecodeError as e:
        print(f'ERROR: {e.msg}')

if __name__ == "__main__":
    main()

Upvotes: 0

Views: 5001

Answers (1)

Lenormju
Lenormju

Reputation: 4368

import json
import csv
import io

# get the JSON objects from JSONL
jsonl_data = """{"a": 1, "b": 123}\n{"a": 2, "b": 234}\n{"a": 3, "b": 345}\n"""
json_lines = tuple(json_line
                   for json_line in jsonl_data.splitlines()
                   if json_line.strip())
jsons_objs = tuple(json.loads(json_line)
                   for json_line in json_lines)

# write them into a CSV file
fake_file = io.StringIO()
writer = csv.writer(fake_file)
writer.writerow(["a", "b"])
writer.writerows((value for key, value in sorted(json_obj.items()))
                 for json_obj in jsons_objs)
print(fake_file.getvalue())

Upvotes: 1

Related Questions