Jed
Jed

Reputation: 399

Import JSON Lines into Pandas

I want to import a JSON lines file into pandas. I tried to import it like a regular JSON file, but it did not work:

js = pd.read_json (r'C:\Users\Name\Downloads\profilenotes.jsonl')

Upvotes: 6

Views: 8080

Answers (3)

Alexander
Alexander

Reputation: 109696

The pd.read_json() function in the pandas library is used to read JSON data into a DataFrame. When reading a JSON Lines (JSONL) file, where each line represents a separate JSON object, you can use the lines=True parameter to properly parse the file, treating each line in the file as a separate JSON object.

df = pd.read_json("test.jsonl", lines=True)

If the file is large, you can also pass the chunksize to manipulate it in chunks.

Upvotes: 16

michael-c-michael
michael-c-michael

Reputation: 1

If your lines have nested expressions like {"A": {"B":3235,"C":2142}} and you want the dataframe to have columns like A.B, A.C instead of A:{"B":3235, "C":2142}, you should use:

import json
import pandas as pd

lines = []
with open(r'test.jsonl') as f:
    lines = f.read().splitlines()
data = [json.loads(line) for line in x.strip().split('\n')]
df = pd.normalize_json(data)

Upvotes: 0

Michael M.
Michael M.

Reputation: 11080

This medium article provides a fairly simple answer, which can be adapted to be even shorter. All you need to do is read each line then parse each line with json.loads(). Like this:

import json
import pandas as pd


lines = []
with open(r'test.jsonl') as f:
    lines = f.read().splitlines()

line_dicts = [json.loads(line) for line in lines]
df_final = pd.DataFrame(line_dicts)

print(df_final)

As cgobat pointed out in a comment, the medium article adds a few extra unnecessary steps, which have been optimized in this answer.

Upvotes: 6

Related Questions