Reputation: 595
Suppose I have a nested dictionary of the format:
dictionary={
"A":[1, 2],
"B":[2, 3],
"Coords":[{
"X":[1,2,3],
"Y":[1,2,3],
"Z":[1,2,3],
},{
"X":[2,3],
"Y":[2,3],
"Z":[2,3],
}]
}
How can I turn this into a Pandas MultiIndex Dataframe?
Equivalently, how can I produce a Dataframe where the information in the row is not duplicated for every co-ordinate?
In what I imagine, the two rows of output DataFrame should appear as follows:
Index A B Coords
---------------------
0 1 2 X Y Z
1 1 1
2 2 2
3 3 3
--------------------
---------------------
1 2 3 X Y Z
2 2 2
3 3 3
--------------------
Upvotes: 1
Views: 232
Reputation: 3455
From your dictionary :
>>> import pandas as pd
>>> df = pd.DataFrame.from_dict(dictionary)
>>> df
A B Coords
0 1 2 {'X': [1, 2, 3], 'Y': [1, 2, 3], 'Z': [1, 2, 3]}
1 2 3 {'X': [2, 3], 'Y': [2, 3], 'Z': [2, 3]}
Then we can use pd.Series
to extract the data in dict in the column Coords
like so :
df_concat = pd.concat([df.drop(['Coords'], axis=1), df['Coords'].apply(pd.Series)], axis=1)
>>> df_concat
A B X Y Z
0 1 2 [1, 2, 3] [1, 2, 3] [1, 2, 3]
1 2 3 [2, 3] [2, 3] [2, 3]
To finish we use the explode
method to get the list as rows and set the index on columns A
and B
to get the expected result :
>>> df_concat.explode(['X', 'Y', 'Z']).reset_index().set_index(['index', 'A', 'B'])
X Y Z
index A B
0 1 2 1 1 1
2 2 2 2
2 3 3 3
1 2 3 2 2 2
3 3 3 3
UPDATE :
If you are using a version of Pandas lower than 1.3.0, we can use the trick given by @MillerMrosek in this answer :
def explode(df, columns):
df['tmp']=df.apply(lambda row: list(zip(*[row[_clm] for _clm in columns])), axis=1)
df=df.explode('tmp')
df[columns]=pd.DataFrame(df['tmp'].tolist(), index=df.index)
df.drop(columns='tmp', inplace=True)
return df
explode(df_concat, ["X", "Y", "Z"]).reset_index().set_index(['index', 'A', 'B'])
Output :
X Y Z
index A B
0 1 2 1 1 1
2 2 2 2
2 3 3 3
1 2 3 2 2 2
3 3 3 3
Upvotes: 3