JavaMan
JavaMan

Reputation: 534

Completely Flatten JSON with nested list using Python Pandas

Here is the example JSON:

{
    "ApartmentBuilding":{
        "Address":{
            "HouseNumber": 5,
            "Street": "DataStreet",
            "ZipCode": 5100
        },
        "Apartments":[
            {
                "Number": 1,
                "Price": 500,
                "Residents": [
                    {
                        "Name": "Bob",
                        "Age": 43
                    },
                    {
                        "Name": "Alice",
                        "Age": 42
                    }
                ]
            },
            {
                "Number": 2,
                "Price": 750,
                "Residents": [
                    {
                        "Name": "Jane",
                        "Age": 43
                    },
                    {
                        "Name": "William",
                        "Age": 42
                    }
                ]
            },
            {
                "Number": 3,
                "Price": 1000,
                "Residents": []
            }
        ]
    }
}

I used below function from : Python Pandas - Flatten Nested JSON

import json
import pandas as pd

def flatten_json(nested_json: dict, exclude: list=['']) -> dict:
    """
    Flatten a list of nested dicts.
    """
    out = dict()
    def flatten(x: (list, dict, str), name: str='', exclude=exclude):
        if type(x) is dict:
            for a in x:
                if a not in exclude:
                    flatten(x[a], f'{name}{a}_')
        elif type(x) is list:
            i = 0
            for a in x:
                flatten(a, f'{name}{i}_')
                i += 1
        else:
            out[name[:-1]] = x

    flatten(nested_json)
    return out

with open("apartments.json") as f:
    data = json.load(f)

print(data)

df = pd.DataFrame([flatten_json(x) for x in data['ApartmentBuilding']])

print(df)

I am aiming to completely flatten the JSON to be transformed into a Panda Dataframe however, get a strange output as belows:

0     Address
1  Apartments

I am basically after something flatten like this:

enter image description here

Upvotes: 1

Views: 229

Answers (1)

Laurent
Laurent

Reputation: 13478

Here is another way to do it using Pandas json_normalize and explode:

import json

import pandas as pd

with open("file.json") as f:
    data = json.load(f)

df = pd.json_normalize(data["ApartmentBuilding"])

First, get and flatten the different parts of the dataframe:

building = df.explode("Apartments").reset_index(drop=True)

apartments = pd.DataFrame(building["Apartments"].to_dict()).T.explode("Residents")

residents = pd.DataFrame(
    apartments["Residents"].dropna().reset_index(drop=True).to_list()
)

Then, build the dataframe back by merging and concatenating the flattened parts:

new_df = pd.merge(
    left=building.loc[:, building.columns != "Apartments"],
    right=apartments.loc[:, apartments.columns != "Residents"],
    right_index=True,
    left_index=True,
).reset_index(drop=True)

new_df = pd.concat([new_df, residents], axis=1)

Do a little cleanup:

new_df.columns = [col.replace("Address.", "") for col in new_df.columns]

And finally:

print(new_df)
# Output
   HouseNumber      Street  ZipCode Number Price     Name   Age
0            5  DataStreet     5100      1   500      Bob  43.0
1            5  DataStreet     5100      1   500    Alice  42.0
2            5  DataStreet     5100      2   750     Jane  43.0
3            5  DataStreet     5100      2   750  William  42.0
4            5  DataStreet     5100      3  1000      NaN   NaN

Upvotes: 2

Related Questions