Reputation: 1785
I have a large JSONL file (~100 GB). I want to convert this to a pandas dataframe and apply some functions on a column by iterating over all the rows .
Whats the best way to read this JSONL file ? I am doing the following currently but that gets stuck (running this on GCP)
import pandas as pd
import json
data = []
with open("my_jsonl_file", 'r') as file:
for line in file:
data.append(json.loads(line))
Upvotes: 0
Views: 2264
Reputation: 820
For smaller data you can simply use:
import pandas as pd
path = "test.jsonl"
data = pd.read_json(path, lines=True)
For large data, you can use something like this:
df = pd.DataFrame(columns=['c1'])
import jsonlines
data = jsonlines.open(path)
for line in data.iter():
# get data in line
df.append({'c1': data})
Upvotes: 1