Reputation: 119
I have pandas df with one column having dictionary of dictionaries: this is what i have:
| ca| cb|
|:---- |:------:
| 1 | cat1:{paws:1 , hair:2} ,dog1:{paws:3 , hair:5}
| 2 | cat2:{paws:1 , hair:2} ,dog2:{paws:3 , hair:5}
| 3 | cat3:{paws:1 , hair:2} ,dog3:{paws:3 , hair:5}
| 4 | cat4:{paws:1 , hair:2} ,dog4:{paws:3 , hair:5}
What i want is:
| ca| animal| paws| hair|
|:----:| -----:| -----:| -----:|
| 1 | cat1 | 1 | 2
| 1 | dog1 | 3 | 5
| 2 | cat2 | 1 | 2
| 2 | dog2 | 3 | 5
What should be fastest way to do this?
Upvotes: 0
Views: 211
Reputation: 41
I found a solution below:
I reproduced a demo version of your dataframe with the following dictionary for the following outcome:
data = {
"ca": [1, 2],
"cb": [{"cat1": {"paws": 1, "hair": 2}, "dog1":{"paws":3 , "hair":5}},
{"cat2":{"paws":1 , "hair":2} , "dog2":{"paws":3 , "hair":5}}]
}
df = pandas.DataFrame(data)
df
ca cb
1 {'cat1': {'paws': 1, 'hair': 2}, 'dog1': {'paw...
2 {'cat2': {'paws': 1, 'hair': 2}, 'dog2': {'paw...
Moving on, I had to eliminate the first step of the dict, ie, extracting the cats and the dogs alike.
first_level = pandas.concat([df.drop(['cb'], axis=1), df['cb'].apply(pandas.Series)], axis=1)
first_level
ca cat1 dog1 cat2 dog2
0 1 {'paws': 1, 'hair': 2} {'paws': 3, 'hair': 5} NaN NaN
1 2 NaN NaN {'paws': 1, 'hair': 2} {'paws': 3, 'hair': 5}
The key takeaway here is that you need to apply the melt function to convert the columns to values and set them to respective rows.
first_level.melt(id_vars=["ca"]).dropna()
first_level
ca variable value
0 1 cat1 {'paws': 1, 'hair': 2}
2 1 dog1 {'paws': 3, 'hair': 5}
5 2 cat2 {'paws': 1, 'hair': 2}
7 2 dog2 {'paws': 3, 'hair': 5}
And then the rest is quite simple, with the same apply function, I can convert this dictionary to columns too and the issue is resolved:
second_level = pandas.concat([first_level.drop(['value'], axis=1), first_level['value'].apply(pandas.Series)], axis=1)
second_level
ca variable paws hair
0 1 cat1 1 2
2 1 dog1 3 5
5 2 cat2 1 2
7 2 dog2 3 5
Upvotes: 1