Reputation: 25
I know this is has been asked many times but I still have no solution to my problem and I'm not that good at programming so basically I have so many json files but the format of the json is very different so I can't get the output that I want with pandas library , here is an example of the json file .
[
{
"Names": [
"James",
"Bob",
"David"
],
"Salary": [
"2000$",
"5000$",
"6000$"
],
"Id": [
"1",
"2",
"3"
]
},
{
"Names": [
"John",
"Charles",
"Harry"
],
"Salary": [
"1000$",
"2000$",
"3000$"
],
"Id": [
"4",
"5",
"6"
]
}
]
when I did convert this with pandas using this simple code:
import json
import pandas as pd
df=pd.read_json("test.json")
df.to_csv("results.csv")
https://i.sstatic.net/jPlQx.png
the problem is the output of csv file give me all the 3 names in one single cell just like that ['James', 'Bob', 'David'] , ['2000$', '5000$', '6000$'] ... but I want only one name in the cell not all 3 of them I'm very new to these stuff I will appreciate any help
Upvotes: 2
Views: 79
Reputation: 3465
I think you want each row to have the id, name and salary. You can achieve this as follows:
import pandas as pd
df=pd.read_json("test.json")
new_df = pd.DataFrame(columns=['id', 'name', 'salary'])
for _, row in df.iterrows():
new_df = new_df.append(pd.DataFrame(
{'id': row.Id, 'name': row.Names, 'salary': row.Salary}))
new_df.to_csv("results.csv")
result in results.csv
,id,name,salary
0,1,James,2000$
1,2,Bob,5000$
2,3,David,6000$
0,4,John,1000$
1,5,Charles,2000$
2,6,Harry,3000$
Basically the initial dataframe df
has the id, names and salary data in lists, so what you do is to make a new dataframe new_df
, then loop over dataframe df
and append to new_df
the dataframe (with same structure as df_new
) that has the values of the rows properly under each column.
This will work no matter how long the lists in rows are as long as they are the same for Id, Names and Salary...
Upvotes: 0
Reputation: 18377
Yes, you are getting that answer because each value for the given key in the dictionary contains a list and not a single element. Therefore there are two levels you should address when working with that kind of json format.
data
is a list that contains two dictionaries with the same keys each. Either dictionary contains keys that contain a list. Therefore we need to iterate over the first list (to address each dictionary) and then over the second to adress each value for any specified key. The output will be the table as you desire. It should be noted that this code will work as length of the list values of "Names"
is the same as "Salary"
and "Id"
.
import pandas as pd
import numpy as np
data = [
{
"Names": ["James","Bob","David"],
"Salary": ["2000$","5000$","6000$"],
"Id": ["1","2","3"]},
{
"Names": ["John","Charles","Harry"],
"Salary": ["1000$","2000$","3000$"],
"Id": ["4","5","6"]}
]
to_df = {'Names':[],'Salary':[],'Id':[]}
for i in range(len(data)):
for j in range(len(data[i]['Id'])):
to_df['Names'].append(data[i]['Names'][j])
to_df['Salary'].append(data[i]['Salary'][j])
to_df['Id'].append(data[i]['Id'][j])
df = pd.DataFrame(to_df)
print(df)
Output:
Names Salary Id
0 James 2000$ 1
1 Bob 5000$ 2
2 David 6000$ 3
3 John 1000$ 4
4 Charles 2000$ 5
5 Harry 3000$ 6
Upvotes: 1
Reputation: 13888
Your data input isn't structured properly for the output you wanted, you'll need to fix that first:
import json
with open('test.json', 'r') as file:
data = json.load(file.read())
# flatten the data to {'Name': [...], 'Salary': [...], ...}
d = {}
for elem in data:
for k, v in elem.items():
d.setdefault(k, []).extend(v)
df = pd.DataFrame(d).reindex(columns=['Id', 'Names', 'Salary'])
Result:
Id Names Salary
0 1 James 2000$
1 2 Bob 5000$
2 3 David 6000$
3 4 John 1000$
4 5 Charles 2000$
5 6 Harry 3000$
Upvotes: 0
Reputation: 367
a = [
{
"Names": [
"James",
"Bob",
"David"
],
"Salary": [
"2000$",
"5000$",
"6000$"
],
"Id": [
"1",
"2",
"3"
]
},
{
"Names": [
"John",
"Charles",
"Harry"
],
"Salary": [
"1000$",
"2000$",
"3000$"
],
"Id": [
"4",
"5",
"6"
]
}
]
I think this might solve your problem:
col_names = [k for k,v in a[0].items()]
frames = [pd.io.json.json_normalize(a, str(col)) for col in col_names]
final_df = pd.concat(frames, axis = 1)
final_df.columns = col_names
Output: '
Id Salary Names
0 1 2000$ James
1 2 5000$ Bob
2 3 6000$ David
3 4 1000$ John
4 5 2000$ Charles
5 6 3000$ Harry
Upvotes: 0
Reputation: 1177
The problem isn't in the function but in the way the json is defined. The pandas output is therefore exactly as it should be.
Instead of reading it in differently you could simply format your dataframe further to show the output you want. At the moment each row for each column is a list, so you need to unnest:
import json
import pandas as pd
import numpy as np
df=pd.read_json('data.json')
def unnesting(df, explode):
idx = df.index.repeat(df[explode[0]].str.len())
df1 = pd.concat([
pd.DataFrame({x: np.concatenate(df[x].values)}) for x in explode], axis=1)
df1.index = idx
return df1.join(df.drop(explode, 1), how='left')
unnesting(df,['Names','Salary','Id'])
The self-made unnesting function comes thanks to WeNYoBen
Upvotes: 0