Reputation: 11192
I have a df like below, I want to tranform values in to columns (pivot) operation. I am unable to perform because my data is in list.
My sample input has two columns (scores and classes). These columns are ordered values. i.e., class 19's score is 0.97 and 0.77 for class 0. I want to transform my df such that classes values will be column names and it's corresponding scores will be in respective column.
Sample Input:
file_name scores classes
0 voc_32.jpg [0.97, 0.77] [19.0, 0.0]
1 voc_22.jpg [0.92, 0.64, 0.83, 0.55] [17.0, 1.0, 11.0, 11.0]
Expected output:
file_name 0 1 11 17 19
0 voc_32.jpg 0.77 0.97
1 voc_22.jpg 0.64 [0.83, 0.55] 0.92
Any help would be appreciable.
Upvotes: 0
Views: 66
Reputation: 862581
Create list of dictionaries in list comprehension and pass to DataFrame
constructor, last add to original by DataFrame.join
:
df1 = (pd.DataFrame([dict(zip(b, a)) for a, b in zip(df.scores, df.classes)],
index=df.index).sort_index(axis=1).rename(columns=int))
df2 = df[['file_name']].join(df1)
Similar solution with drop columns by DataFrame.pop
:
df1 = (pd.DataFrame([dict(zip(b, a)) for a, b in zip(df.pop('scores'), df.pop('classes'))],
index=df.index).sort_index(axis=1).rename(columns=int))
df2 = df.join(df1)
print (df2)
file_name 0 1 11 17 19
0 voc_32.jpg 0.77 NaN NaN NaN 0.97
1 voc_22.jpg NaN 0.64 0.83 0.92 NaN
EDIT: For list if multiple classes use Series.explode
for flattem, then aggregate custom function in GroupBy.agg
with reshape by Series.unstack
:
f = lambda x: list(x) if len(x) > 1 else x
df1 = (df.apply(pd.Series.explode)
.groupby(['file_name','classes'])['scores']
.agg(f)
.unstack()
.rename(columns=int))
print (df1)
classes 0 1 11 17 19
file_name
voc_22.jpg NaN 0.64 [0.83, 0.85] 0.92 NaN
voc_32.jpg 0.77 NaN NaN NaN 0.97
df2 = df[['file_name']].join(df1, on='file_name')
print (df2)
file_name 0 1 11 17 19
0 voc_32.jpg 0.77 NaN NaN NaN 0.97
1 voc_22.jpg NaN 0.64 [0.83, 0.85] 0.92 NaN
Upvotes: 3