Reputation: 17631
Consider the following dictionary of dictionaries in python3.x
dict1 = {4: {4:25, 5:39, 3:42}, 5:{24:94, 252:49, 25:4, 55:923}}
I would like to unfold this into a pandas DataFrame. There appear to be two options:
df1 = pd.DataFrame.from_dict(dict1, orient='columns')
print(df1)
4 5
3 42.0 NaN
4 25.0 NaN
5 39.0 NaN
24 NaN 94.0
25 NaN 4.0
55 NaN 923.0
252 NaN 49.0
whereby the columns for this are the main dictionary keys 4
and `5', the row indices are the subdictionary keys and the values are the subdictionary values.
The other option is
df2 = pd.DataFrame.from_dict(dict1, orient='index')
print(df2)
4 5 3 24 252 25 55
4 25.0 39.0 42.0 NaN NaN NaN NaN
5 NaN NaN NaN 94.0 49.0 4.0 923.0
whereby the columns are the keys of the inner "sub-dictionary", the row indices are the keys of the main dictionary, and the values are the subdictionary keys.
Is there a standard approach which allows us to unfold the python dictionary as follows?
key inner_key values
4 3 42
4 4 25
4 5 39
5 24 94
5 25 4
5 55 923
5 252 49
It would be best not to manipulate the DataFrame after using from_dict()
, as for far larger python dictionaries, this could become quite memory intensive.
Upvotes: 2
Views: 815
Reputation: 4638
pd.DataFrame([[i,j,user_dict[i][j] ] for i in user_dict.keys() for j in user_dict[i].keys()],columns=['key', 'inner_key', 'values'])
Output:
key inner_key values
0 4 4 25
1 4 5 39
2 4 3 42
3 5 24 94
4 5 252 49
5 5 25 4
6 5 55 923
Upvotes: 1
Reputation: 164683
A list comprehension should be fairly efficient:
dict1 = {4: {4:25, 5:39, 3:42}, 5: {24:94, 252:49, 25:4, 55:923}}
cols = ['key', 'inner_key', 'values']
df = pd.DataFrame([[k1, k2, v2] for k1, v1 in dict1.items() for k2, v2 in v1.items()],
columns=cols).sort_values(cols)
print(df)
key inner_key values
2 4 3 42
0 4 4 25
1 4 5 39
3 5 24 94
5 5 25 4
6 5 55 923
4 5 252 49
pd.melt
+ dropna
If you don't mind working from df1
, you can unpivot your dataframe via pd.melt
and then drop rows with null value
.
df1 = df1.reset_index()
res = pd.melt(df1, id_vars='index', value_vars=[4, 5])\
.dropna(subset=['value']).astype(int)
print(res)
index variable value
0 3 4 42
1 4 4 25
2 5 4 39
10 24 5 94
11 25 5 4
12 55 5 923
13 252 5 49
Upvotes: 2