Stefan
Stefan

Reputation: 311

JSON_Normalize Pandas with lists

I have a a JSON file formatted in the following nested way.

[
  {
    "unitCode": "ABCD",
    "bedType": "Adult MT/MS",
    "census": 13,
    "subCensus": null,
    "censusDetails": [],
    "occupancy": 62,
    "occupancyStar": null,
    "occupancyAlertStatus": null,
    "columns": [
      {
        "id": "blockedBeds",
        "value": "1",
        "hoverDetails": [
          {
            "id": "bedName",
            "value": "23_1"
          }
        ]
      },
      {
        "id": "unOccupied",
        "value": "2",
        "hoverDetails": [
          {
            "id": "bedName",
            "value": "20a_2"
          },
          {
            "id": "bedName",
            "value": "22a_1"
          }
        ]
      }
    ],
    "codeEvents": null,
    "codeEventDetails": null
  },
  {
    "unitCode": "EFGH",
    "bedType": "Adult MT/MS",
    "census": 14,
    "subCensus": null,
    "censusDetails": [],
    "occupancy": 61,
    "occupancyStar": null,
    "occupancyAlertStatus": null,
    "columns": [
      {
        "id": "blockedBeds",
        "value": "1",
        "hoverDetails": [
          {
            "id": "bedName",
            "value": "52_2"
          }
        ]
      },
      {
        "id": "unOccupied",
        "value": "1",
        "hoverDetails": [
          {
            "id": "bedName",
            "value": "53_1"
          }
        ]
      }
    ],
    "codeEvents": null,
    "codeEventDetails": null
  }
]

I am tryin to flatten this file and convert it to a dataframe with json_normalize. Here is my code: testhover = json_normalize(data, ['columns'],['unitCode'])

The dataframe that I get looks as follows:

    id          | value |   hoverDetails                                       | unitCode
0   blockedBeds | 1     |   [{'id': 'bedName', 'value': '23_1'}]               | ABCD
1   unOccupied  | 2     |   [{'id': 'bedName', 'value': '20a_2'}, {'id': '...' | ABCD
2   blockedBeds | 1     |   [{'id': 'bedName', 'value': '52_2'}]               | EFGH
3   unOccupied  | 1     |   [{'id': 'bedName', 'value': '53_1'}]               | EFGH

I need it in the following format:

    blockedBeds   |  unOccupied  |   unitCode
0 | '23_1'        |  NaN         |   ABCD
1 | NaN           |  '20a_2'     |   ABCD
2 | NaN           |  '22a_1'     |   ABCD
3 | '52_2'        |  NaN         |   EFGH
4 | NaN           |  '53_1'      |   EFGH

I cannot seem get to the nested bed data. I would really appreciate the help.

Upvotes: 0

Views: 546

Answers (1)

Eric Truett
Eric Truett

Reputation: 3010

You should create a list of dicts from a loop and use that to create the dataframe.

vals = []

for item in parsed_json:
    unit_code = item['unitCode']
    for col in item['columns']:
        for hd in col['hoverDetails']:
            vals.append({'unitCode': unit_code,
                          col['id']: hd['value']})

pd.DataFrame(vals)

Output

  unitCode blockedBeds unOccupied
0     ABCD        23_1        NaN
1     ABCD         NaN      20a_2
2     ABCD         NaN      22a_1
3     EFGH        52_2        NaN
4     EFGH         NaN       53_1

Upvotes: 3

Related Questions