Jack Rolph
Jack Rolph

Reputation: 595

Pandas: MultiIndex from Nested Dictionary

Suppose I have a nested dictionary of the format:

dictionary={
     "A":[1, 2],
     "B":[2, 3],
     "Coords":[{
        "X":[1,2,3],
        "Y":[1,2,3],
        "Z":[1,2,3],
     },{
        "X":[2,3],
        "Y":[2,3],
        "Z":[2,3],
     }]
 }

How can I turn this into a Pandas MultiIndex Dataframe?

Equivalently, how can I produce a Dataframe where the information in the row is not duplicated for every co-ordinate?

In what I imagine, the two rows of output DataFrame should appear as follows:

Index A   B   Coords 
---------------------
0      1   2   X Y Z
              1 1 1
              2 2 2
              3 3 3
--------------------
---------------------
1      2   3   X Y Z
              2 2 2
              3 3 3
--------------------

Upvotes: 1

Views: 232

Answers (1)

tlentali
tlentali

Reputation: 3455

From your dictionary :

>>> import pandas as pd

>>> df = pd.DataFrame.from_dict(dictionary)
>>> df
    A   B   Coords
0   1   2   {'X': [1, 2, 3], 'Y': [1, 2, 3], 'Z': [1, 2, 3]}
1   2   3   {'X': [2, 3], 'Y': [2, 3], 'Z': [2, 3]}

Then we can use pd.Series to extract the data in dict in the column Coords like so :

df_concat = pd.concat([df.drop(['Coords'], axis=1), df['Coords'].apply(pd.Series)], axis=1)
>>> df_concat
    A   B   X           Y           Z
0   1   2   [1, 2, 3]   [1, 2, 3]   [1, 2, 3]
1   2   3   [2, 3]      [2, 3]      [2, 3]

To finish we use the explode method to get the list as rows and set the index on columns A and B to get the expected result :

>>> df_concat.explode(['X', 'Y', 'Z']).reset_index().set_index(['index', 'A', 'B'])
                X   Y   Z
index   A   B           
0       1   2   1   1   1
            2   2   2   2
            2   3   3   3
1       2   3   2   2   2
            3   3   3   3

UPDATE :

If you are using a version of Pandas lower than 1.3.0, we can use the trick given by @MillerMrosek in this answer :

def explode(df, columns):
    df['tmp']=df.apply(lambda row: list(zip(*[row[_clm] for _clm in columns])), axis=1) 
    df=df.explode('tmp')
    df[columns]=pd.DataFrame(df['tmp'].tolist(), index=df.index)
    df.drop(columns='tmp', inplace=True)
    return df

explode(df_concat, ["X", "Y", "Z"]).reset_index().set_index(['index', 'A', 'B'])

Output :

                X   Y   Z
index   A   B           
0       1   2   1   1   1
            2   2   2   2
            2   3   3   3
1       2   3   2   2   2
            3   3   3   3

Upvotes: 3

Related Questions