Rei M
Rei M

Reputation: 25

converting json with different format to csv

I know this is has been asked many times but I still have no solution to my problem and I'm not that good at programming so basically I have so many json files but the format of the json is very different so I can't get the output that I want with pandas library , here is an example of the json file .

[
    {
        "Names": [
            "James",
            "Bob",
            "David"
        ],
        "Salary": [
            "2000$",
            "5000$",
            "6000$"
        ],
        "Id": [
            "1",
            "2",
            "3"
        ]
    },
    {
        "Names": [
            "John",
            "Charles",
            "Harry"
        ],
        "Salary": [
            "1000$",
            "2000$",
            "3000$"
        ],
        "Id": [
            "4",
            "5",
            "6"
        ]
    }
]

when I did convert this with pandas using this simple code:

import json
import pandas as pd

df=pd.read_json("test.json")
df.to_csv("results.csv")

https://i.sstatic.net/jPlQx.png

the problem is the output of csv file give me all the 3 names in one single cell just like that ['James', 'Bob', 'David'] , ['2000$', '5000$', '6000$'] ... but I want only one name in the cell not all 3 of them I'm very new to these stuff I will appreciate any help

Upvotes: 2

Views: 79

Answers (5)

Bruno Vermeulen
Bruno Vermeulen

Reputation: 3465

I think you want each row to have the id, name and salary. You can achieve this as follows:

import pandas as pd

df=pd.read_json("test.json")

new_df = pd.DataFrame(columns=['id', 'name', 'salary'])
for _, row in df.iterrows():
    new_df = new_df.append(pd.DataFrame(
        {'id': row.Id, 'name': row.Names, 'salary': row.Salary}))

new_df.to_csv("results.csv")

result in results.csv

,id,name,salary
0,1,James,2000$
1,2,Bob,5000$
2,3,David,6000$
0,4,John,1000$
1,5,Charles,2000$
2,6,Harry,3000$

Basically the initial dataframe df has the id, names and salary data in lists, so what you do is to make a new dataframe new_df, then loop over dataframe df and append to new_df the dataframe (with same structure as df_new) that has the values of the rows properly under each column.

This will work no matter how long the lists in rows are as long as they are the same for Id, Names and Salary...

Upvotes: 0

Celius Stingher
Celius Stingher

Reputation: 18377

Yes, you are getting that answer because each value for the given key in the dictionary contains a list and not a single element. Therefore there are two levels you should address when working with that kind of json format.

data is a list that contains two dictionaries with the same keys each. Either dictionary contains keys that contain a list. Therefore we need to iterate over the first list (to address each dictionary) and then over the second to adress each value for any specified key. The output will be the table as you desire. It should be noted that this code will work as length of the list values of "Names" is the same as "Salary" and "Id".

import pandas as pd 
import numpy as np
data = [
    {
        "Names": ["James","Bob","David"],
        "Salary": ["2000$","5000$","6000$"],
        "Id": ["1","2","3"]},
    {
        "Names": ["John","Charles","Harry"],
        "Salary": ["1000$","2000$","3000$"],
        "Id": ["4","5","6"]}
]
to_df = {'Names':[],'Salary':[],'Id':[]}
for i in range(len(data)):
    for j in range(len(data[i]['Id'])):
        to_df['Names'].append(data[i]['Names'][j])
        to_df['Salary'].append(data[i]['Salary'][j])
        to_df['Id'].append(data[i]['Id'][j])
df = pd.DataFrame(to_df)
print(df)

Output:

     Names Salary Id
0    James  2000$  1
1      Bob  5000$  2
2    David  6000$  3
3     John  1000$  4
4  Charles  2000$  5
5    Harry  3000$  6

Upvotes: 1

r.ook
r.ook

Reputation: 13888

Your data input isn't structured properly for the output you wanted, you'll need to fix that first:

import json

with open('test.json', 'r') as file:
    data = json.load(file.read())

# flatten the data to {'Name': [...], 'Salary': [...], ...}
d = {}
for elem in data:
    for k, v in elem.items():
        d.setdefault(k, []).extend(v)

df = pd.DataFrame(d).reindex(columns=['Id', 'Names', 'Salary'])

Result:

  Id    Names Salary
0  1    James  2000$
1  2      Bob  5000$
2  3    David  6000$
3  4     John  1000$
4  5  Charles  2000$
5  6    Harry  3000$

Upvotes: 0

naive
naive

Reputation: 367

a = [
    {
        "Names": [
            "James",
            "Bob",
            "David"
        ],
        "Salary": [
            "2000$",
            "5000$",
            "6000$"
        ],
        "Id": [
            "1",
            "2",
            "3"
        ]
    },
    {
        "Names": [
            "John",
            "Charles",
            "Harry"
        ],
        "Salary": [
            "1000$",
            "2000$",
            "3000$"
        ],
        "Id": [
            "4",
            "5",
            "6"
        ]
    }
]

I think this might solve your problem:

col_names = [k for k,v in a[0].items()]
frames = [pd.io.json.json_normalize(a, str(col)) for col in col_names]
final_df = pd.concat(frames, axis = 1)
final_df.columns = col_names

Output: '

  Id Salary    Names
0  1  2000$    James
1  2  5000$      Bob
2  3  6000$    David
3  4  1000$     John
4  5  2000$  Charles
5  6  3000$    Harry

Upvotes: 0

Fnguyen
Fnguyen

Reputation: 1177

The problem isn't in the function but in the way the json is defined. The pandas output is therefore exactly as it should be.

Instead of reading it in differently you could simply format your dataframe further to show the output you want. At the moment each row for each column is a list, so you need to unnest:

import json
import pandas as pd
import numpy as np

df=pd.read_json('data.json')

def unnesting(df, explode):
    idx = df.index.repeat(df[explode[0]].str.len())
    df1 = pd.concat([
        pd.DataFrame({x: np.concatenate(df[x].values)}) for x in explode], axis=1)
    df1.index = idx

    return df1.join(df.drop(explode, 1), how='left')

unnesting(df,['Names','Salary','Id'])

The self-made unnesting function comes thanks to WeNYoBen

Upvotes: 0

Related Questions