nucsit026
nucsit026

Reputation: 720

Pivot_table from lists in a column value

I have a dataframe like:

    ID       Sim        Items          
    1        0.345      [7,7]            
    2        0.604      [2,7,3,8,5]      
    3        0.082      [9,1,9,1]

I want to form a pivot_table by:

df.pivot_table(index ="ID" , columns = "Items", values="Sim")

To do that, I have to extract list elements in items column and repeat the ID,Sim values for each unique elements in row_list. To be as:

    ID       Sim        Items          
    1        0.345        7  

    2        0.604        2 
    2        0.604        7  
    2        0.604        3  
    2        0.604        8  
    2        0.604        5

    3        0.082        9
    3        0.082        1

pivot table :

         7       2       3       8      5      1      9

    1  0.345     -       -       -      -      -      -
    2  0.604   0.604   0.604   0.604   0.604
    3    -      -        -       -     -     0.082  0.082        

Is there any pythonic approach for that? Or any suggestions?

Upvotes: 1

Views: 1405

Answers (1)

anky
anky

Reputation: 75100

Use explode(new in pandas 0.25+) before pivot;

df.explode('Items').pivot_table(index ="ID" , columns = "Items", values="Sim")

Items      1      2      3      5      7      8      9
ID                                                    
1        NaN    NaN    NaN    NaN  0.345    NaN    NaN
2        NaN  0.604  0.604  0.604  0.604  0.604    NaN
3      0.082    NaN    NaN    NaN    NaN    NaN  0.082

for lower versions of pandas, you can try with:

(df.drop('Items',1).join(pd.DataFrame(df['Items'].tolist())
 .stack(dropna=False).droplevel(1).rename('Items'))
  .pivot_table(index ="ID" , columns = "Items", values="Sim"))

Items      1      2      3      5      7      8      9
ID                                                    
1        NaN    NaN    NaN    NaN  0.345    NaN    NaN
2        NaN  0.604  0.604  0.604  0.604  0.604    NaN
3      0.082    NaN    NaN    NaN    NaN    NaN  0.082

If exact ordering matters , use reindex with unique of Items after explode:

(df.explode('Items').pivot_table(index ="ID" , columns = "Items", values="Sim")
       .reindex(df.explode('Items')['Items'].unique(),axis=1))

Items      7      2      3      8      5      9      1
ID                                                    
1      0.345    NaN    NaN    NaN    NaN    NaN    NaN
2      0.604  0.604  0.604  0.604  0.604    NaN    NaN
3        NaN    NaN    NaN    NaN    NaN  0.082  0.082

Upvotes: 2

Related Questions