Reputation: 311
I have a a JSON file formatted in the following nested way.
[
{
"unitCode": "ABCD",
"bedType": "Adult MT/MS",
"census": 13,
"subCensus": null,
"censusDetails": [],
"occupancy": 62,
"occupancyStar": null,
"occupancyAlertStatus": null,
"columns": [
{
"id": "blockedBeds",
"value": "1",
"hoverDetails": [
{
"id": "bedName",
"value": "23_1"
}
]
},
{
"id": "unOccupied",
"value": "2",
"hoverDetails": [
{
"id": "bedName",
"value": "20a_2"
},
{
"id": "bedName",
"value": "22a_1"
}
]
}
],
"codeEvents": null,
"codeEventDetails": null
},
{
"unitCode": "EFGH",
"bedType": "Adult MT/MS",
"census": 14,
"subCensus": null,
"censusDetails": [],
"occupancy": 61,
"occupancyStar": null,
"occupancyAlertStatus": null,
"columns": [
{
"id": "blockedBeds",
"value": "1",
"hoverDetails": [
{
"id": "bedName",
"value": "52_2"
}
]
},
{
"id": "unOccupied",
"value": "1",
"hoverDetails": [
{
"id": "bedName",
"value": "53_1"
}
]
}
],
"codeEvents": null,
"codeEventDetails": null
}
]
I am tryin to flatten this file and convert it to a dataframe with json_normalize
.
Here is my code:
testhover = json_normalize(data, ['columns'],['unitCode'])
The dataframe that I get looks as follows:
id | value | hoverDetails | unitCode
0 blockedBeds | 1 | [{'id': 'bedName', 'value': '23_1'}] | ABCD
1 unOccupied | 2 | [{'id': 'bedName', 'value': '20a_2'}, {'id': '...' | ABCD
2 blockedBeds | 1 | [{'id': 'bedName', 'value': '52_2'}] | EFGH
3 unOccupied | 1 | [{'id': 'bedName', 'value': '53_1'}] | EFGH
I need it in the following format:
blockedBeds | unOccupied | unitCode
0 | '23_1' | NaN | ABCD
1 | NaN | '20a_2' | ABCD
2 | NaN | '22a_1' | ABCD
3 | '52_2' | NaN | EFGH
4 | NaN | '53_1' | EFGH
I cannot seem get to the nested bed data. I would really appreciate the help.
Upvotes: 0
Views: 546
Reputation: 3010
You should create a list of dicts from a loop and use that to create the dataframe.
vals = []
for item in parsed_json:
unit_code = item['unitCode']
for col in item['columns']:
for hd in col['hoverDetails']:
vals.append({'unitCode': unit_code,
col['id']: hd['value']})
pd.DataFrame(vals)
Output
unitCode blockedBeds unOccupied
0 ABCD 23_1 NaN
1 ABCD NaN 20a_2
2 ABCD NaN 22a_1
3 EFGH 52_2 NaN
4 EFGH NaN 53_1
Upvotes: 3