Reputation: 1409
I have a JSON file as follows:
{
"ALPHA": [
{
"date": "2021-06-22",
"constituents": {
"BBB": 0,
"EEE": 1,
"BTB": 1,
"YUY": 1
}
},
{
"date": "2021-09-07",
"constituents": {
"BBB": 0,
"EEE": 0,
"BTB": 0,
"YUY": 0
}
}
],
"BETA": [
{
"date": "2021-06-22",
"constituents": {
"BBB": 1,
"EEE": 1,
"BTB": 1,
"YUY": 1
}
},
{
"date": "2021-09-07",
"constituents": {
"BBB": 1,
"EEE": 1,
"BTB": 1,
"YUY": 1
}
}
],
"THETA": [
{
"date": "2021-06-22",
"constituents": {
"BBB": 0,
"EEE": 1,
"BTB": 1,
"YUY": 0
}
},
{
"date": "2021-08-20",
"constituents": {
"BBB": 0,
"EEE": 1,
"BTB": 1,
"YUY": 0
}
},
{
"date": "2021-09-07",
"constituents": {
"BBB": 0,
"EEE": 1,
"BTB": 1,
"YUY": 0
}
}
]
}
I want to read the above into a pandas data frame where the first index is the date, the second index is the first keys (i.e. "ALPHA", "BETA", "THETA"), the columns are the inner keys (i.e. "BBB" ,"EEE", "BTB" ,"YUY"), and the cell values are the values of these inner keys.
How can I read that into pandas from the JSON file?
Upvotes: 5
Views: 593
Reputation: 28699
I feel you get better performance, and potentially easier manipulation, if you deal with python native data structures outside Pandas, before pulling the final form into Pandas:
Let's flatten the nested dictionary into a single dictionary, using Python's tools:
container = []
for key, value in j.items(): # j is the main dictionary
for entry in value:
content = {'date': entry['date'],
'key': key,
# expand the nested constituent data
# this gets us a single dictionary
**entry['constituents']}
container.append(content)
print(container)
[{'date': '2021-06-22',
'key': 'ALPHA',
'BBB': 0,
'EEE': 1,
'BTB': 1,
'YUY': 1},
{'date': '2021-09-07',
'key': 'ALPHA',
'BBB': 0,
'EEE': 0,
'BTB': 0,
'YUY': 0},
{'date': '2021-06-22', 'key': 'BETA', 'BBB': 1, 'EEE': 1, 'BTB': 1, 'YUY': 1},
{'date': '2021-09-07', 'key': 'BETA', 'BBB': 1, 'EEE': 1, 'BTB': 1, 'YUY': 1},
{'date': '2021-06-22',
'key': 'THETA',
'BBB': 0,
'EEE': 1,
'BTB': 1,
'YUY': 0},
{'date': '2021-08-20',
'key': 'THETA',
'BBB': 0,
'EEE': 1,
'BTB': 1,
'YUY': 0},
{'date': '2021-09-07',
'key': 'THETA',
'BBB': 0,
'EEE': 1,
'BTB': 1,
'YUY': 0}]
Now, build the dataframe, and set the required columns as index:
pd.DataFrame(container).set_index(['date', 'key'])
BBB EEE BTB YUY
date key
2021-06-22 ALPHA 0 1 1 1
2021-09-07 ALPHA 0 0 0 0
2021-06-22 BETA 1 1 1 1
2021-09-07 BETA 1 1 1 1
2021-06-22 THETA 0 1 1 0
2021-08-20 THETA 0 1 1 0
2021-09-07 THETA 0 1 1 0
Upvotes: 1
Reputation: 23217
You can use pd.Series
to import the JSON to a Pandas series with ALPHA
, BETA
as index and elements as list. Then expand the list of JSON to individual JSON by .explode()
. Expand the inner JSON to dataframe by .apply()
+ pd.Series
.
Append date
as index by .set_index()
with append=True
; swap date
from second index to first index by .swaplevel()
.
Finally, take the column constituents
and further expand the inner JSON to dataframe by .apply()
+ pd.Series
, as follows:
(assume you have already loaded the JSON file into j
)
df = (pd.Series(j)
.explode()
.apply(pd.Series)
.set_index('date', append=True)
.swaplevel()['constituents']
.apply(pd.Series)
)
Data input:
j = {'ALPHA': [{'date': '2021-06-22',
'constituents': {'BBB': 0, 'EEE': 1, 'BTB': 1, 'YUY': 1}},
{'date': '2021-09-07',
'constituents': {'BBB': 0, 'EEE': 0, 'BTB': 0, 'YUY': 0}}],
'BETA': [{'date': '2021-06-22',
'constituents': {'BBB': 1, 'EEE': 1, 'BTB': 1, 'YUY': 1}},
{'date': '2021-09-07',
'constituents': {'BBB': 1, 'EEE': 1, 'BTB': 1, 'YUY': 1}}],
'THETA': [{'date': '2021-06-22',
'constituents': {'BBB': 0, 'EEE': 1, 'BTB': 1, 'YUY': 0}},
{'date': '2021-08-20',
'constituents': {'BBB': 0, 'EEE': 1, 'BTB': 1, 'YUY': 0}},
{'date': '2021-09-07',
'constituents': {'BBB': 0, 'EEE': 1, 'BTB': 1, 'YUY': 0}}]}
Output
print(df)
BBB EEE BTB YUY
date
2021-06-22 ALPHA 0 1 1 1
2021-09-07 ALPHA 0 0 0 0
2021-06-22 BETA 1 1 1 1
2021-09-07 BETA 1 1 1 1
2021-06-22 THETA 0 1 1 0
2021-08-20 THETA 0 1 1 0
2021-09-07 THETA 0 1 1 0
Upvotes: 4