Reputation: 1851
I have a df that looks like this
COL1 COL2 COL3
0 ABC N [{u'PERF_PCT': 0.2,u'PERF_DATE': 12},{u'PERF_PCT': 0.3,u'PERF_DATE': 13}]
1 XYZ N [{u'PERF_PCT': 0.6,u'PERF_DATE': 12},{u'PERF_PCT': 0.2,u'PERF_DATE': 13},{u'PERF_PCT': 0.7,u'PERF_DATE': 14}]
I need to unpack column 3 such that the dataframe has additional rows for each dictionary in the list in COL3 and additional columns for each key in the dict. The key-value pairs in the dict are fixed.
COL1 COL2 PERF_PCT PERF_DATE
0 ABC N 0.2 12
1 ABC N 0.3 13
2 XYZ N 0.6 12
3 XYZ N 0.2 13
4 XYZ N 0.7 14
I have been able to accomplish the same using a for loop but I need something extremely performant. The resultant df can have around 170k records and currently using the for loop it takes about 20+ seconds which is unacceptable. I am hoping using pandas specific apply or other functions can make this faster but have not been able to do so. If there's an extremely fast way to do this I'll be really grateful. Thanks.
Upvotes: 1
Views: 4249
Reputation: 323226
Try this :)
Idx=df.set_index(['COL1','COL2']).COL3.apply(pd.Series).stack().index
pd.DataFrame(df.set_index(['COL1','COL2']).COL3.apply(pd.Series).stack().values.tolist(),index=Idx).reset_index().drop('level_2',1)
Out[318]:
COL1 COL2 PERF_DATE PERF_PCT
0 ABC N 12 0.2
1 ABC N 13 0.3
2 XYZ N 12 0.6
3 XYZ N 13 0.2
4 XYZ N 14 0.7
Data Input
df = pd.DataFrame({'COL1':['ABC','XYZ'],
'COL2': ['N','N'],
'COL3' :[[{u'PERF_PCT': 0.2,u'PERF_DATE': 12},{u'PERF_PCT': 0.3,u'PERF_DATE': 13}],[{u'PERF_PCT': 0.6,u'PERF_DATE': 12},{u'PERF_PCT': 0.2,u'PERF_DATE': 13},{u'PERF_PCT': 0.7,u'PERF_DATE': 14}]] })
Upvotes: 5