Reputation: 481
I am puzzled. I have two data frames and I want to unstack them. The first data frame works perfectly..
df:
"",Measurement data name,Speed
0,100UM_S304_T6.3_F3400_P-2.5_G1.4_D100_X012820_S2,4.845373130234829
1,100UM_S304_T6.3_F3400_P-2.5_G1.4_D100_X012820_S3,4.263109524556444
2,100UM_S304_T6.3_F3400_P-2.5_G1.4_D100_X012820_S4,4.477553508022049
3,100UM_S304_T6.3_F3400_P-3.0_G1.4_D100_X012820_S2,4.225669481330404
4,100UM_S304_T6.3_F3400_P-3.0_G1.4_D100_X012820_S3,3.964186569036525
5,100UM_S304_T6.3_F3400_P-3.0_G1.4_D100_X012820_S4,4.381883773694611
6,100UM_S304_T6.3_F3400_P-3.5_G1.4_D100_X012820_S2,4.4611936089867035
7,100UM_S304_T6.3_F3400_P-3.5_G1.4_D100_X012820_S3,4.011543928072122
8,100UM_S304_T6.3_F3400_P-3.5_G1.4_D100_X012820_S4,4.760764146212687
-
new_df = (df.join(df['Measurement data name'].str.rsplit('_', 1, expand=True))
.set_index([0, 1])
.drop('Measurement data name', axis=1)
.unstack(1))
new_cols = [('{1} {0}'.format(*tup)) for tup in new_df.columns]
new_df.columns = new_cols
final_df = new_df.reset_index()
final_df = final_df.rename(columns={0: 'Measurement data name'})
final_df:
"",Measurement data name,S2 Speed,S3 Speed,S4 Speed
0,100UM_S304_T6.3_F3400_P-2.5_G1.4_D100_X012820,4.845373130234829,4.263109524556444,4.477553508022049
1,100UM_S304_T6.3_F3400_P-3.0_G1.4_D100_X012820,4.225669481330404,3.964186569036525,4.381883773694611
2,100UM_S304_T6.3_F3400_P-3.5_G1.4_D100_X012820,4.4611936089867035,4.011543928072122,4.760764146212687
The second data frame doesn't work the same..
df:
"",Measurement data name,Speed
0,VR-20200211_161131_A5052_T6.2_F3200_P0.0_G1.3_50UM_X111519_S2,11.157404940816864
1,VR-20200211_161321_A5052_T6.2_F3200_P0.0_G1.3_50UM_X111519_S3,10.167709975092029
2,VR-20200211_161454_A5052_T6.2_F3200_P0.0_G1.3_50UM_X111519_S4,9.888377066223338
3,VR-20200211_162028_A5052_T6.2_F3200_P-2.0_G1.3_50UM_X111519_S2,7.240216451403143
4,VR-20200211_175514_A5052_T6.2_F3200_P-2.0_G1.3_50UM_X111519_S3,8.311288630510798
5,VR-20200212_090341_A5052_T6.2_F3200_P-2.0_G1.3_50UM_X111519_S4,7.6884571127601555
-
new_df = (df.join(df['Measurement data name'].str.rsplit('_', 1, expand=True))
.set_index([0, 1])
.drop('Measurement data name', axis=1)
.unstack(1))
new_cols = [('{1} {0}'.format(*tup)) for tup in new_df.columns]
new_df.columns = new_cols
final_df = new_df.reset_index()
final_df = final_df.rename(columns={0: 'Measurement data name'})
final_df:
"",Measurement data name,S2 Speed,S3 Speed,S4 Speed
0,VR-20200211_161131_A5052_T6.2_F3200_P0.0_G1.3_50UM_X111519,11.157404940816864,,
1,VR-20200211_161321_A5052_T6.2_F3200_P0.0_G1.3_50UM_X111519,,10.167709975092029,
2,VR-20200211_161454_A5052_T6.2_F3200_P0.0_G1.3_50UM_X111519,,,9.888377066223338
3,VR-20200211_162028_A5052_T6.2_F3200_P-2.0_G1.3_50UM_X111519,7.240216451403143,,
4,VR-20200211_175514_A5052_T6.2_F3200_P-2.0_G1.3_50UM_X111519,,8.311288630510798,
5,VR-20200212_090341_A5052_T6.2_F3200_P-2.0_G1.3_50UM_X111519,,,7.6884571127601555
Any ideas why?? Thanks.
Upvotes: 0
Views: 38
Reputation: 28303
Each of the Measurement data names
in your second dataframe is unique, even after you remove the speed
component. it looks like the time-of-day component that is giving uniqueness to the measurement data names.
# from df2['Measurement data names']
VR-20200211_161131_A5052_T6.2_F3200_P0.0_G1.3_50UM_X111519_S2
VR-20200211_161321_A5052_T6.2_F3200_P0.0_G1.3_50UM_X111519_S3
VR-20200211_161454_A5052_T6.2_F3200_P0.0_G1.3_50UM_X111519_S4
^^^^^^
...
Upvotes: 1