Reputation: 399
I want to import a JSON lines file into pandas. I tried to import it like a regular JSON file, but it did not work:
js = pd.read_json (r'C:\Users\Name\Downloads\profilenotes.jsonl')
Upvotes: 6
Views: 8080
Reputation: 109696
The pd.read_json()
function in the pandas library is used to read JSON data into a DataFrame. When reading a JSON Lines (JSONL) file, where each line represents a separate JSON object, you can use the lines=True
parameter to properly parse the file, treating each line in the file as a separate JSON object.
df = pd.read_json("test.jsonl", lines=True)
If the file is large, you can also pass the chunksize
to manipulate it in chunks.
Upvotes: 16
Reputation: 1
If your lines have nested expressions like {"A": {"B":3235,"C":2142}}
and you want the dataframe to have columns like A.B, A.C
instead of A:{"B":3235, "C":2142}
, you should use:
import json
import pandas as pd
lines = []
with open(r'test.jsonl') as f:
lines = f.read().splitlines()
data = [json.loads(line) for line in x.strip().split('\n')]
df = pd.normalize_json(data)
Upvotes: 0
Reputation: 11080
This medium article provides a fairly simple answer, which can be adapted to be even shorter. All you need to do is read each line then parse each line with json.loads()
. Like this:
import json
import pandas as pd
lines = []
with open(r'test.jsonl') as f:
lines = f.read().splitlines()
line_dicts = [json.loads(line) for line in lines]
df_final = pd.DataFrame(line_dicts)
print(df_final)
As cgobat pointed out in a comment, the medium article adds a few extra unnecessary steps, which have been optimized in this answer.
Upvotes: 6