mchd
mchd

Reputation: 3163

How to write a JSONL file in Python

I am trying to write a .jsonl file that needs to look like this:

{"file_name": "0001.png", "text": "This is a golden retriever playing with a ball"}
{"file_name": "0002.png", "text": "A german shepherd"}
{"file_name": "0003.png", "text": "One chihuahua"}

This is my attempt:

import json
import pandas as pd

dt = pd.read_csv('data.csv')
df = pd.DataFrame(dt)

file_name = df['image']
file_caption = df['text']

data = []

for i in range(len(file_name)):
    entry = {"file_name": file_name[i], "text": file_caption[i]}
    data.append(entry)

json_object = json.dumps(data, indent=4)

# Writing to sample.json
with open("metadata.jsonl", "w") as outfile:
    outfile.write(json_object)

But this is the output I get:

[
    {
        "file_name": "images/image_0.jpg",
        "text": "Fattoush Salad with Roasted Potatoes"
    },
    {
        "file_name": "images/image_1.jpg",
        "text": "an analysis of self portrayal in novels by virginia woolf A room of one's own study guide contains a biography of virginia woolf, literature essays, quiz questions, major themes, characters, and a full summary and analysis about a room of one's own a room of one's own summary."
    },
    {
        "file_name": "images/image_2.jpg",
        "text": "Christmas Comes Early to U.K. Weekly Home Entertainment Chart"
    },
    {
        "file_name": "images/image_3.jpg",
        "text": "Amy Garcia Wikipedia a legacy of reform: dorothea dix (1802\u20131887) | states of"
    },
    {
        "file_name": "images/image_4.jpg",
        "text": "3D Metal Cornish Harbour Painting"
    },
    {
        "file_name": "images/image_5.jpg",
        "text": "\"In this undated photo provided by the New York City Ballet, Robert Fairchild performs in \"\"In Creases\"\" by choreographer Justin Peck which is being performed by the New York City Ballet in New York. (AP Photo/New York City Ballet, Paul Kolnik)\""
    },
...
]

I know that its because I am dumping a list so I know where I'm going wrong but how do I create a .jsonl file like the format above?

Upvotes: 0

Views: 4125

Answers (1)

Mark Tolonen
Mark Tolonen

Reputation: 177901

Don't indent the generated JSON and don't append it to a list. Just write out each line to the file:

import json
import pandas as pd

df = pd.DataFrame([['0001.png', "This is a golden retriever playing with a ball"],
                   ['0002.png', "A german shepherd"],
                   ['0003.png', "One chihuahua"]], columns=['filename','text'])

with open("metadata.jsonl", "w") as outfile:
    for file, caption in zip(df['filename'], df['text']):
        entry = {"file_name": file, "text": caption}
        print(json.dumps(entry), file=outfile)

Output:

{"file_name": "0001.png", "text": "This is a golden retriever playing with a ball"}
{"file_name": "0002.png", "text": "A german shepherd"}
{"file_name": "0003.png", "text": "One chihuahua"}

Upvotes: 4

Related Questions