Turn dictionaries inside a pandas column into a dataframe

Question

I have a Dataframe which looks like this:

    cus_id          cat_counts
0   4462            {'LOUNGE': 2}
1   5482            {'MAJOR APPLIANCES': 2, 'SMALL APPLIANCES': 1}
2   8101            {'BEDDING': 1, 'MAJOR APPLIANCES': 1}
3   3388            {'MAJOR APPLIANCES': 1, 'BEDROOM SUITES': 1}
5   3030            {'ACCESSORIES': 1, 'VISUAL': 2, 'MAJOR APPLIANCES' : 2}
6   8942            {'VISUAL': 1, 'AUDIO': 1, 'BEDDING': 1}
7   5775            {'ACCESSORIES': 2, 'VISUAL': 1}
8   5489            {'ACCESSORIES': 1, 'KITCHENWARE': 1, 'BEDDING' : 1
9   9370            {'ACCESSORIES': 1, 'VISUAL': 2, 'MAJOR APPLIANCES' : 5}
10  5936            {'KITCHENWARE': 1, 'ACCESSORIES': 4, 'VISUAL': 3}
11  3854            {'ACCESSORIES': 1, 'VISUAL': 2}
12  4016            {'LOOSE CARPETING': 1, 'BEDDING': 1}
13  3503            {'HOME OFFICE': 2}

I would like to convert the cat_counts column into seperate columns joined to the main dataframe with the keys being the columns and the values being the counts in the column. Here's an example (some columns have been omitted for space):

    cus_id         LOUNGE | MAJOR APPLIANCES | SMALL APPLIANCES | BEDDING | BEDROOM SUITES | ACCESSORIES
0   4462              2           0                  0               0           0               0
1   5482              0           2                  1               0           0               0
2   8101              0           1                  0               1           0               0  
3   3388              0           1                  0               0           1               0
5   3030              0           2                  0               0           0               1
6   8942              0           0                  0               1           0               0 
7   5775              0           0                  0               0           0               2
8   5489              0           0                  0               1           0               1
9   9370              0           5                  0               0           0               1
10  5936              0           0                  0               0           0               4
11  3854              0           0                  0               0           0               1
12  4016              0           0                  0               1           0               0
13  3503              0           0                  0               0           0               0

I have tried df['cat_counts'].apply(lambda a: pd.DataFrame.from_dict(a)) but i get an error If using all scalar values, you must pass an index. I've checked this question Transform a Counter object into a Pandas DataFrame & turning a collections counter into dictionary but they don't answer my question. I don't know if this is even possible. All I know is you can create a Dataframe from a dictionary. If there is a stackoverflow post regarding this please point me in the right direction

jezrael · Accepted Answer

If performance is not important convert each dict to Series and replace missing values to 0:

df['cat_counts'].apply(lambda a: pd.Series(a)).fillna(0)

For improve performance convert column to DataFrame:

pd.DataFrame(df['cat_counts'].tolist()).fillna(0)

All together with DataFrame.pop for extract column and DataFrame.join for add original columns:

df1 = df.join(pd.DataFrame(df.pop('cat_counts').tolist()).fillna(0).astype(int))

Turn dictionaries inside a pandas column into a dataframe

Answers (2)

Related Questions