Mohamed Thasin ah
Mohamed Thasin ah

Reputation: 11192

How to transform list of values into pandas columns?

I have a df like below, I want to tranform values in to columns (pivot) operation. I am unable to perform because my data is in list.

My sample input has two columns (scores and classes). These columns are ordered values. i.e., class 19's score is 0.97 and 0.77 for class 0. I want to transform my df such that classes values will be column names and it's corresponding scores will be in respective column.

Sample Input:

    file_name        scores                         classes  
0  voc_32.jpg  [0.97, 0.77]                     [19.0, 0.0]   
1  voc_22.jpg  [0.92, 0.64, 0.83, 0.55]         [17.0, 1.0, 11.0, 11.0]   

Expected output:

    file_name  0      1     11           17     19
0  voc_32.jpg  0.77                            0.97 
1  voc_22.jpg         0.64  [0.83, 0.55]       0.92

Any help would be appreciable.

Upvotes: 0

Views: 66

Answers (1)

jezrael
jezrael

Reputation: 862581

Create list of dictionaries in list comprehension and pass to DataFrame constructor, last add to original by DataFrame.join:

df1 = (pd.DataFrame([dict(zip(b, a)) for a, b in zip(df.scores, df.classes)], 
                     index=df.index).sort_index(axis=1).rename(columns=int))
df2 = df[['file_name']].join(df1)

Similar solution with drop columns by DataFrame.pop:

df1 = (pd.DataFrame([dict(zip(b, a)) for a, b in zip(df.pop('scores'), df.pop('classes'))], 
                     index=df.index).sort_index(axis=1).rename(columns=int))
df2 = df.join(df1)
print (df2)
    file_name     0     1    11    17    19
0  voc_32.jpg  0.77   NaN   NaN   NaN  0.97
1  voc_22.jpg   NaN  0.64  0.83  0.92   NaN

EDIT: For list if multiple classes use Series.explode for flattem, then aggregate custom function in GroupBy.agg with reshape by Series.unstack:

f = lambda x: list(x) if len(x) > 1 else x
df1 = (df.apply(pd.Series.explode)
         .groupby(['file_name','classes'])['scores']
         .agg(f)
         .unstack()
         .rename(columns=int))
print (df1)
classes       0     1             11    17    19
file_name                                       
voc_22.jpg   NaN  0.64  [0.83, 0.85]  0.92   NaN
voc_32.jpg  0.77   NaN           NaN   NaN  0.97

df2 = df[['file_name']].join(df1, on='file_name')
print (df2)

    file_name     0     1            11    17    19
0  voc_32.jpg  0.77   NaN           NaN   NaN  0.97
1  voc_22.jpg   NaN  0.64  [0.83, 0.85]  0.92   NaN

Upvotes: 3

Related Questions