Reputation: 335
I have a Dataframe
which looks like this:
cus_id cat_counts
0 4462 {'LOUNGE': 2}
1 5482 {'MAJOR APPLIANCES': 2, 'SMALL APPLIANCES': 1}
2 8101 {'BEDDING': 1, 'MAJOR APPLIANCES': 1}
3 3388 {'MAJOR APPLIANCES': 1, 'BEDROOM SUITES': 1}
5 3030 {'ACCESSORIES': 1, 'VISUAL': 2, 'MAJOR APPLIANCES' : 2}
6 8942 {'VISUAL': 1, 'AUDIO': 1, 'BEDDING': 1}
7 5775 {'ACCESSORIES': 2, 'VISUAL': 1}
8 5489 {'ACCESSORIES': 1, 'KITCHENWARE': 1, 'BEDDING' : 1
9 9370 {'ACCESSORIES': 1, 'VISUAL': 2, 'MAJOR APPLIANCES' : 5}
10 5936 {'KITCHENWARE': 1, 'ACCESSORIES': 4, 'VISUAL': 3}
11 3854 {'ACCESSORIES': 1, 'VISUAL': 2}
12 4016 {'LOOSE CARPETING': 1, 'BEDDING': 1}
13 3503 {'HOME OFFICE': 2}
I would like to convert the cat_counts
column into seperate columns joined to the main dataframe with the key
s being the columns and the value
s being the counts in the column. Here's an example (some columns have been omitted for space):
cus_id LOUNGE | MAJOR APPLIANCES | SMALL APPLIANCES | BEDDING | BEDROOM SUITES | ACCESSORIES
0 4462 2 0 0 0 0 0
1 5482 0 2 1 0 0 0
2 8101 0 1 0 1 0 0
3 3388 0 1 0 0 1 0
5 3030 0 2 0 0 0 1
6 8942 0 0 0 1 0 0
7 5775 0 0 0 0 0 2
8 5489 0 0 0 1 0 1
9 9370 0 5 0 0 0 1
10 5936 0 0 0 0 0 4
11 3854 0 0 0 0 0 1
12 4016 0 0 0 1 0 0
13 3503 0 0 0 0 0 0
I have tried df['cat_counts'].apply(lambda a: pd.DataFrame.from_dict(a))
but i get an error
If using all scalar values, you must pass an index
. I've checked this question
Transform a Counter object into a Pandas DataFrame &
turning a collections counter into dictionary
but they don't answer my question. I don't know if this is even possible. All I know is you can create a Dataframe
from a dictionary. If there is a stackoverflow post regarding this please point me in the right direction
Upvotes: 3
Views: 2022
Reputation: 862681
If performance is not important convert each dict to Series
and replace missing values to 0
:
df['cat_counts'].apply(lambda a: pd.Series(a)).fillna(0)
For improve performance convert column to DataFrame
:
pd.DataFrame(df['cat_counts'].tolist()).fillna(0)
All together with DataFrame.pop
for extract column and DataFrame.join
for add original columns:
df1 = df.join(pd.DataFrame(df.pop('cat_counts').tolist()).fillna(0).astype(int))
Upvotes: 4
Reputation: 21453
The right way to do this in pandas is using json_normalize
:
from pandas.io.json import json_normalize
json_normalize(df.cat_counts)
And to join:
pd.concat([df, json_normalize(df.cat_counts)])
Upvotes: 0