Ram K
Ram K

Reputation: 1785

Reading a large json file into a pandas dataframe

I have a large JSONL file (~100 GB). I want to convert this to a pandas dataframe and apply some functions on a column by iterating over all the rows .

Whats the best way to read this JSONL file ? I am doing the following currently but that gets stuck (running this on GCP)

import pandas as pd
import json
data = []
with open("my_jsonl_file", 'r') as file:
      for line in file:
          data.append(json.loads(line))

Upvotes: 0

Views: 2264

Answers (1)

Bilal Dadanlar
Bilal Dadanlar

Reputation: 820

For smaller data you can simply use:

import pandas as pd
path = "test.jsonl"
data = pd.read_json(path, lines=True) 

For large data, you can use something like this:

df = pd.DataFrame(columns=['c1'])
import jsonlines
data = jsonlines.open(path)

for line in data.iter():
  # get data in line
  df.append({'c1': data})

Upvotes: 1

Related Questions