Naveen
Naveen

Reputation: 81

Converting csv to json file format . not sure how to achieve the expected output in json using pandas or python

How to convert flat file to json format using pandas or python. I have tried below code to read csv file and explicitly creating address column.

Code Tried

df = pd.read_csv(input_file, sep=",", header=0)
    df['address'] = df.apply(
        lambda x: dict(line1=x['line1'], line2=x['line2'], line3=x['line3']), axis=1)
    print(df)

Input Data file

id,name,line1,line2,line3
5,"ABC","123","456",67
5,"ABC","456","456",67

Output Getting in json format

[
    {
    "source_id":5,
    "name":"ABC",
    "line1":123,
    "line2":456,
    "line3":67,
    "address":{
            "line1":123,
            "line2":456,
            "line3":67
            }
    },
    {
    "source_id":5,
    "name":"ABC",
    "line1":456,
    "line2":456,
    "line3":67
    "address":{
            "line1":456,
            "line2":456,
            "line3":67
            }
    }
]

Expected Output

[
    {
        "source_id": 5,
        "name":"ABC",
        "address":[
                {
                "line1": 123,
                "line2": 456,
                "line3": 67
                },
                {
                "line1": 456,
                "line2": 456,
                "line3": 67
                }
            ]
    }
]

Upvotes: 0

Views: 99

Answers (2)

Naveen
Naveen

Reputation: 81

path = config['path']['input_file_path']
json_path = config['json_path']['input_json_path']
g_cols = ['source_id', 'fname', 'lname', 'email', 'date_of_birth']
df = pd.read_csv(path, sep=",", header=0)
with open(json_path, 'r') as f:
    dict_val = json.load(f)
df.columns = df.columns.to_series().map(dict_val)
cols= df.columns[~df.columns.isin(g_cols)]
g_cols.remove('email')
df1 = (df.sort_values(g_cols).set_index(g_cols).assign(email=df.groupby(g_cols)['email'].agg(lambda x: tuple(pd.unique(x)))).reset_index())
g_cols.append('email')
df2 = df1.groupby(g_cols)[cols].apply(lambda x: x.to_dict('records')).reset_index(name='address').to_dict('record')
df3 = pd.DataFrame(df2)

Upvotes: 0

Anurag Dabas
Anurag Dabas

Reputation: 24322

You can try:

df=df.rename(columns={'id':'source_id'})
g_cols=['source_id','fn','ln','email','dob']
cols=df.columns[~df.columns.isin(g_cols)]
out=(df.groupby(g_cols)[cols]
       .apply(lambda x:x.to_dict('records'))
       .reset_index(name='address').to_dict('records'))

output of out:

[{'source_id': 5,
  'fn': 'sam',
  'ln': 'joe',
  'email': '[email protected]',
  'dob': '05-12-1984',
  'address': [{'ln1': 123,
    'ln2': 456,
    'ln3': 67,
    'cty': 'xxx',
    'state': 'tn',
    'cntry': 'india'},
   {'ln1': 456,
    'ln2': 456,
    'ln3': 67,
    'cty': 'xxx',
    'state': 'tn',
    'cntry': 'india'},
   {'ln1': 123,
    'ln2': 789,
    'ln3': 67,
    'cty': 'xxx',
    'state': 'tn',
    'cntry': 'india'}]},
 {'source_id': 7,
  'fn': 'priya',
  'ln': 'kannan',
  'email': '[email protected]',
  'dob': '07-12-1994',
  'address': [{'ln1': 123,
    'ln2': 456,
    'ln3': 67,
    'cty': 'mdu',
    'state': 'tn',
    'cntry': 'india'}]}]

Upvotes: 2

Related Questions