Reputation: 7500
I have a json file which I converted to dict like below:
{'DATA': [{'COMPANY_SCHEMA': 'ABC', 'CONFIG_TYPE': 'rtype', 'IM_ID': '44f8d1b4_437e', 'MODIFIED_DATE': 'Unknown', 'ID': 'Test', 'CONFIG_KEY': 'posting_f', 'SYSTEM_NUMBER': '50', 'SYS_CONFIG_VALUE': '0', 'SYS_CONFIG_STRING_VALUE': 'true'}
I wrote the following code to convert a json file to above dict format
with open('data.json') as data_file:
data = json.load(data_file)
Now I am trying to store this dict as pandas data frame with keys as column headers.
So I write below:
df=pd.DataFrame.from_dict(data,orient='columns')
But I get all columns as one column.
df.head(3)
DATA
0 {'COMPANY_SCHEMA': 'ABC.', 'CON...
1 {'COMPANY_SCHEMA': 'ABC', 'CON...
2 {'COMPANY_SCHEMA': 'ABC', 'CON...
I basically have a bunch of such json files in a folder and I am trying to read all of them and store in one pandas data frame appended one below the other.
So I was trying above. So
1) why the above error when converting to pandas data frame and
ii) Is there a better and faster way to read a bunch of such files and append to one json and then add it to pandas frame or one by one?
Upvotes: 0
Views: 7724
Reputation: 30050
Your data is broken. After analyzing your question, I construct one like following:
{'DATA': [{'COMPANY_SCHEMA': 'ABC', 'CONFIG_TYPE': 'rtype', 'IM_ID': '44f8d1b4_437e', 'MODIFIED_DATE': 'Unknown', 'ID': 'Test', 'CONFIG_KEY': 'posting_f', 'SYSTEM_NUMBER': '50', 'SYS_CONFIG_VALUE': '0', 'SYS_CONFIG_STRING_VALUE': 'true'}, {'COMPANY_SCHEMA': 'ABC', 'CONFIG_TYPE': 'rtype', 'IM_ID': '44f8d1b4_437e', 'MODIFIED_DATE': 'Unknown', 'ID': 'Test', 'CONFIG_KEY': 'posting_f', 'SYSTEM_NUMBER': '50', 'SYS_CONFIG_VALUE': '0', 'SYS_CONFIG_STRING_VALUE': 'true'}]}
Since you only give the converted dict and JSON specification - RFC7159 states that a string begins and ends with quotation mark which is "
. I just take the dict as an example.
I use ast.literal_eval() to safely get a data structure from a string, which is dict
same with your json.load()
. After getting a dict
object, there are various ways to convert it to dataframe.
import ast
import pandas as pd
with open('data.dict') as data_file:
dict_data = ast.literal_eval(data_file.read())
# The following methods all produce the same output:
pd.DataFrame(dict_data['DATA'])
pd.DataFrame.from_dict(dict_data['DATA'])
pd.DataFrame.from_records(dict_data['DATA'])
# print(pd.DataFrame(dict_data['DATA']))
COMPANY_SCHEMA CONFIG_TYPE IM_ID MODIFIED_DATE ID CONFIG_KEY SYSTEM_NUMBER SYS_CONFIG_VALUE SYS_CONFIG_STRING_VALUE
0 ABC rtype 44f8d1b4_437e Unknown Test posting_f 50 0 true
1 ABC rtype 44f8d1b4_437e Unknown Test posting_f 50 0 true
- why the above error when converting to pandas data frame
If you mean why there's only one column, that's pandas.DataFrame.from_dict() treats the keys of the dict as the DataFrame columns by default. If you do df=pd.DataFrame.from_dict(data)
, whose key is DATA
. So there is only one column named DATA
ii) Is there a better and faster way to read a bunch of such files and append to one json and then add it to pandas frame or one by one?
My solution is to concat all the dict data to one list:
with open('data1.json') as data_file:
dict_data1 = json.load(data_file)
....
data = dict_data1['DATA'] + dict_data2['DATA']
# Convert to pandas dataframe
pd.DataFrame(data)
# Dump the data to json file
with open('result.json', 'w') as fp:
json.dump({'DATA': data}, fp)
You could use a for loop to simplify the procedure.
Upvotes: 1
Reputation: 14851
The data you provide is broken, so it is hard to reproduce. Try to provide a reproducible case when asking! ;-)
Anyway I guess you just need to:
df = pandas.DataFrame(data['DATA'])
Where data
is the dictionary you created with json.load()
.
A pandas.DataFrame()
can be initialized with a list of dictionaries with no problem, but you need to pass the list of dictionaries.
If you are concerned about performance then yeah, append to your list of dictionaries first and convert the whole list to a DataFrame with pandas.DataFrame(list_of_dictionaries)
.
Upvotes: 1
Reputation: 2219
Not sure about why you are getting the error you show, but I would skip converting the json to a dictionary and just use pd.read_json()
instead.
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_json.html
Upvotes: 1