Ben
Ben

Reputation: 109

pandas unstack on reshaping the dataframe.

I have this sample dataframe

ID1,ID2,info,Value
1,50,BA,101
1,50,BA,102
2,100,CA,03
2,100,CA,07
2,100,BA,09
2,100,BA,10

To reshape the data, I am using unstack and this is more like 90% which I want and it works.

grouped = df.groupby(['ID1','ID1','ID2','info'])[['Value']].apply(lambda x: pd.DataFrame(x.values, columns=[['Value']]))\
                        .unstack().sort_index(level=1, axis=1).reset_index(drop= True, level=1).reset_index()

grouped.columns = ['{}{}'.format(a,b) for a, b in df_grouped.columns]

and this is the result I have

ID1,ID2,info,Value0,Value1
1,50,BA,101,102
2,100,BA,9,10
2,100,CA,3,7

probably it would be more clearer if I can put all the ID1 in one row and if I have two different info value corresponding to the ID1, then I would like to achieve something similar like this

ID1,ID2,info,Value0,Value1,Value2,Value3
1,50,BA,101,102,,
2,100,[BA,CA],9,10,3,7

Any suggestions ?

Upvotes: 0

Views: 38

Answers (1)

jezrael
jezrael

Reputation: 863176

Your solution should be changed by aggregate list first, then if necessary only unique values of list convert values to sets and back to lists and from another columns create DataFrame which isjoined to back:

grouped = df.groupby(['ID1','ID1','ID2']).agg(list)

grouped['info'] = grouped['info'].apply(lambda x: list(set(x)))

df1 = pd.DataFrame(grouped.pop('Value').values.tolist(),
                   index=grouped.index).add_prefix('Value')
grouped = grouped.join(df1)
print (grouped)
                 info  Value0  Value1  Value2  Value3
ID1 ID1 ID2                                          
1   1   50       [BA]     101     102     NaN     NaN
2   2   100  [BA, CA]       3       7     9.0    10.0

Upvotes: 1

Related Questions