Fizi
Fizi

Reputation: 1851

Pandas: unpack a column with list of dict values into multiple columns

I have a df that looks like this

            COL1   COL2    COL3                             
0         ABC      N       [{u'PERF_PCT': 0.2,u'PERF_DATE': 12},{u'PERF_PCT': 0.3,u'PERF_DATE': 13}]
1         XYZ      N       [{u'PERF_PCT': 0.6,u'PERF_DATE': 12},{u'PERF_PCT': 0.2,u'PERF_DATE': 13},{u'PERF_PCT': 0.7,u'PERF_DATE': 14}] 

I need to unpack column 3 such that the dataframe has additional rows for each dictionary in the list in COL3 and additional columns for each key in the dict. The key-value pairs in the dict are fixed.

          COL1    COL2     PERF_PCT       PERF_DATE              
0         ABC      N       0.2            12
1         ABC      N       0.3            13
2         XYZ      N       0.6            12 
3         XYZ      N       0.2            13
4         XYZ      N       0.7            14 

I have been able to accomplish the same using a for loop but I need something extremely performant. The resultant df can have around 170k records and currently using the for loop it takes about 20+ seconds which is unacceptable. I am hoping using pandas specific apply or other functions can make this faster but have not been able to do so. If there's an extremely fast way to do this I'll be really grateful. Thanks.

Upvotes: 1

Views: 4249

Answers (1)

BENY
BENY

Reputation: 323226

Try this :)

Idx=df.set_index(['COL1','COL2']).COL3.apply(pd.Series).stack().index

pd.DataFrame(df.set_index(['COL1','COL2']).COL3.apply(pd.Series).stack().values.tolist(),index=Idx).reset_index().drop('level_2',1)

Out[318]: 
  COL1 COL2  PERF_DATE  PERF_PCT
0  ABC    N         12       0.2
1  ABC    N         13       0.3
2  XYZ    N         12       0.6
3  XYZ    N         13       0.2
4  XYZ    N         14       0.7

Data Input

df = pd.DataFrame({'COL1':['ABC','XYZ'],
              'COL2': ['N','N'],
               'COL3' :[[{u'PERF_PCT': 0.2,u'PERF_DATE': 12},{u'PERF_PCT': 0.3,u'PERF_DATE': 13}],[{u'PERF_PCT': 0.6,u'PERF_DATE': 12},{u'PERF_PCT': 0.2,u'PERF_DATE': 13},{u'PERF_PCT': 0.7,u'PERF_DATE': 14}]]   })

Upvotes: 5

Related Questions