user3302483
user3302483

Reputation: 855

How do a convert a column in a dataframe that is in the format of a dictionary into separate columns?

I have a pandas dataframe where the first column is in the format of a dictionary although the type is object. I want to convert this field into 3 separate fields attached to the original dataframe - the 3 fields being driven by the 3 keys in the dictionary; card, player and turn. My dataframe looks like this:

                                                 card    player  turn
0   {'name': 'Tap', 'cost': 2, 'id': '056'}        me     2
1   {'name': 'Coin', 'cost': None, 'id': '051'}  opponent     2
2   {'name': 'Pawnbroker', 'cost': 3,'id': '055'}     2
3   {'name': 'fire', 'cost': 2, 'id': 'E1_596'}        me     3
4   {'name': 'Coil', 'cost': 1, 'id': 'E1_56'}        me     3
5   {'name': 'Pawnbroker', 'cost': 3, 'id': 'E6'}     3

Upvotes: 0

Views: 52

Answers (2)

DYZ
DYZ

Reputation: 57033

Suppose your dictionary column is called 'foo':

df = pd.concat([df, df['foo'].apply(pd.Series)], axis=1)
#   card                                          foo  player turn  cost    id  name
#0        me  {'cost': 2, 'id': '056', 'name': 'Tap'}       2        2.0   056   Tap 
#1  opponent  {'cost': None, 'id': '051', 'name': 'Coin'}   2        NaN   051  Coin

You can now delete the unwanted column:

del df['foo']; print(df)
#       card  player turn  cost   id  name
#0        me       2        2.0  056   Tap
#1  opponent       2        NaN  051  Coin

Upvotes: 3

jezrael
jezrael

Reputation: 862571

You can use pop for remove column to card and then DataFrame constructor with concat:

print (pd.concat([df, pd.DataFrame(df.pop('card').values.tolist())],axis=1))
     player  turn  cost      id        name
0        me   2.0   2.0     056         Tap
1  opponent   2.0   NaN     051        Coin
2         2   NaN   3.0     055  Pawnbroker
3        me   3.0   2.0  E1_596        fire
4        me   3.0   1.0   E1_56        Coil
5         3   NaN   3.0      E6  Pawnbroker

Timings:

#[6000 rows x 3 columns]
df = pd.concat([df]*1000).reset_index(drop=True)

In [391]: %timeit (df['card'].apply(pd.Series))
1 loop, best of 3: 1.26 s per loop

In [392]: %timeit (pd.DataFrame(df['card'].values.tolist()))
100 loops, best of 3: 6.72 ms per loop

Upvotes: 1

Related Questions