Mengezi Dhlomo
Mengezi Dhlomo

Reputation: 335

Turn dictionaries inside a pandas column into a dataframe

I have a Dataframe which looks like this:

    cus_id          cat_counts
0   4462            {'LOUNGE': 2}
1   5482            {'MAJOR APPLIANCES': 2, 'SMALL APPLIANCES': 1}
2   8101            {'BEDDING': 1, 'MAJOR APPLIANCES': 1}
3   3388            {'MAJOR APPLIANCES': 1, 'BEDROOM SUITES': 1}
5   3030            {'ACCESSORIES': 1, 'VISUAL': 2, 'MAJOR APPLIANCES' : 2}
6   8942            {'VISUAL': 1, 'AUDIO': 1, 'BEDDING': 1}
7   5775            {'ACCESSORIES': 2, 'VISUAL': 1}
8   5489            {'ACCESSORIES': 1, 'KITCHENWARE': 1, 'BEDDING' : 1
9   9370            {'ACCESSORIES': 1, 'VISUAL': 2, 'MAJOR APPLIANCES' : 5}
10  5936            {'KITCHENWARE': 1, 'ACCESSORIES': 4, 'VISUAL': 3}
11  3854            {'ACCESSORIES': 1, 'VISUAL': 2}
12  4016            {'LOOSE CARPETING': 1, 'BEDDING': 1}
13  3503            {'HOME OFFICE': 2}

I would like to convert the cat_counts column into seperate columns joined to the main dataframe with the keys being the columns and the values being the counts in the column. Here's an example (some columns have been omitted for space):

    cus_id         LOUNGE | MAJOR APPLIANCES | SMALL APPLIANCES | BEDDING | BEDROOM SUITES | ACCESSORIES
0   4462              2           0                  0               0           0               0
1   5482              0           2                  1               0           0               0
2   8101              0           1                  0               1           0               0  
3   3388              0           1                  0               0           1               0
5   3030              0           2                  0               0           0               1
6   8942              0           0                  0               1           0               0 
7   5775              0           0                  0               0           0               2
8   5489              0           0                  0               1           0               1
9   9370              0           5                  0               0           0               1
10  5936              0           0                  0               0           0               4
11  3854              0           0                  0               0           0               1
12  4016              0           0                  0               1           0               0
13  3503              0           0                  0               0           0               0

I have tried df['cat_counts'].apply(lambda a: pd.DataFrame.from_dict(a)) but i get an error If using all scalar values, you must pass an index. I've checked this question Transform a Counter object into a Pandas DataFrame & turning a collections counter into dictionary but they don't answer my question. I don't know if this is even possible. All I know is you can create a Dataframe from a dictionary. If there is a stackoverflow post regarding this please point me in the right direction

Upvotes: 3

Views: 2022

Answers (2)

jezrael
jezrael

Reputation: 862681

If performance is not important convert each dict to Series and replace missing values to 0:

df['cat_counts'].apply(lambda a: pd.Series(a)).fillna(0)

For improve performance convert column to DataFrame:

pd.DataFrame(df['cat_counts'].tolist()).fillna(0)

All together with DataFrame.pop for extract column and DataFrame.join for add original columns:

df1 = df.join(pd.DataFrame(df.pop('cat_counts').tolist()).fillna(0).astype(int))

Upvotes: 4

PascalVKooten
PascalVKooten

Reputation: 21453

The right way to do this in pandas is using json_normalize:

from pandas.io.json import json_normalize
json_normalize(df.cat_counts)

And to join:

pd.concat([df, json_normalize(df.cat_counts)])

Upvotes: 0

Related Questions