Converting csv to json file format . not sure how to achieve the expected output in json using pandas or python

Question

How to convert flat file to json format using pandas or python. I have tried below code to read csv file and explicitly creating address column.

Code Tried

df = pd.read_csv(input_file, sep=",", header=0)
    df['address'] = df.apply(
        lambda x: dict(line1=x['line1'], line2=x['line2'], line3=x['line3']), axis=1)
    print(df)

Input Data file

id,name,line1,line2,line3
5,"ABC","123","456",67
5,"ABC","456","456",67

Output Getting in json format

[
    {
    "source_id":5,
    "name":"ABC",
    "line1":123,
    "line2":456,
    "line3":67,
    "address":{
            "line1":123,
            "line2":456,
            "line3":67
            }
    },
    {
    "source_id":5,
    "name":"ABC",
    "line1":456,
    "line2":456,
    "line3":67
    "address":{
            "line1":456,
            "line2":456,
            "line3":67
            }
    }
]

Expected Output

[
    {
        "source_id": 5,
        "name":"ABC",
        "address":[
                {
                "line1": 123,
                "line2": 456,
                "line3": 67
                },
                {
                "line1": 456,
                "line2": 456,
                "line3": 67
                }
            ]
    }
]

Anurag Dabas · Accepted Answer

You can try:

df=df.rename(columns={'id':'source_id'})
g_cols=['source_id','fn','ln','email','dob']
cols=df.columns[~df.columns.isin(g_cols)]
out=(df.groupby(g_cols)[cols]
       .apply(lambda x:x.to_dict('records'))
       .reset_index(name='address').to_dict('records'))

output of out:

[{'source_id': 5,
  'fn': 'sam',
  'ln': 'joe',
  'email': 'sam123@gmail.com',
  'dob': '05-12-1984',
  'address': [{'ln1': 123,
    'ln2': 456,
    'ln3': 67,
    'cty': 'xxx',
    'state': 'tn',
    'cntry': 'india'},
   {'ln1': 456,
    'ln2': 456,
    'ln3': 67,
    'cty': 'xxx',
    'state': 'tn',
    'cntry': 'india'},
   {'ln1': 123,
    'ln2': 789,
    'ln3': 67,
    'cty': 'xxx',
    'state': 'tn',
    'cntry': 'india'}]},
 {'source_id': 7,
  'fn': 'priya',
  'ln': 'kannan',
  'email': 'shanthapriya794@gmail.com',
  'dob': '07-12-1994',
  'address': [{'ln1': 123,
    'ln2': 456,
    'ln3': 67,
    'cty': 'mdu',
    'state': 'tn',
    'cntry': 'india'}]}]

Converting csv to json file format . not sure how to achieve the expected output in json using pandas or python

Answers (2)

Related Questions