Alicia_2024
Alicia_2024

Reputation: 531

Save pandas dataframe to json by column value

I have a dataframe with 10000 rows that look like below:

    import numpy as np
df=pd.DataFrame(np.array([['facebook', '15',"women tennis"], ['facebook', '20',"men basketball"], ['facebook', '30','club'],
                          ['apple', "10","vice president"], ['apple', "100",'swimming contest']]),columns=['firm','id','text'])

enter image description here

I'd like to save each firm into a separate JSON file. So the json file for Facebook looks like below, with the file name written as "firm.json" (e.g. facebook.json). The same will be for other firms, such as Apple.

enter image description here

Sorry, I am still a beginner to Pandas, is there a way to do so effectively?

Upvotes: 1

Views: 712

Answers (3)

SomeDude
SomeDude

Reputation: 14228

You can do:

json_cols = df.columns.drop('firm').tolist()
json_records = df.groupby('firm')[json_cols].apply(
                          lambda x:x.to_json(orient='records'))

Then for 'facebook':

facebook_json = json_records['facebook']

'[{"id":"15","text":"women tennis"},
  {"id":"20","text":"men basketball"},
  {"id":"30","text":"club"}]'

for 'apple':

apple_json = json_records['apple']

'[{"id":"10","text":"vice president"},{"id":"100","text":"swimming contest"}]'

Save all at once

for col, records in json_records.iteritems():
    with open(f"{col}.json", "w") as file:
        file.write(records)

Upvotes: 3

Adrian Ang
Adrian Ang

Reputation: 580

Here's one way to do it:

import json
import numpy as np

df=pd.DataFrame(np.array([['facebook', '15',"women tennis"], ['facebook', '20',"men basketball"], ['facebook', '30','club'],
                          ['apple', "10","vice president"], ['apple', "100",'swimming contest']]),columns=['firm','id','text'])

for firm in set(df['firm']):
    f = open(firm + '.json', 'w')
    f.write(json.dumps(list(df[df['firm']==firm][['id', 'text']].T.to_dict().values())))
    f.close()

Output:

apple.json
[{"id": "10", "text": "vice president"}, {"id": "100", "text": "swimming contest"}]

facebook.json
[{"id": "15", "text": "women tennis"}, {"id": "20", "text": "men basketball"}, {"id": "30", "text": "club"}]

Upvotes: 1

Michael Hodel
Michael Hodel

Reputation: 3005

import pandas as pd
import numpy as np
import json

df = pd.DataFrame(
    np.array([
        ['facebook', '15',"women tennis"],
        ['facebook', '20',"men basketball"],
        ['facebook', '30','club'],
        ['apple', "10","vice president"],
        ['apple', "100",'swimming contest']]
    ), columns=['firm','id','text']
)

firms = set(df.firm)
for firm in firms:
    df_firm = df[df.firm == firm]
    d = []
    for _, r in df_firm.iterrows():
        d.append({'id': r.id, 'Text': str(r.text)})
    with open(f'{firm}.json', 'w') as f:
        json.dump(d, f)

I'm sure there's a simpler way, but thats one way.

Upvotes: 1

Related Questions