Reputation: 93
I have an excel workbook with multiple sheets laid out like the following:
[ID] [Date] [TEST_A] [TEST_B] [TEST_C]
ID_1234 13/06/2017 11:00 1194.256258 1287.016744 1343.434
ID_1234 13/06/2017 12:00 1194.266828 1287.16688 1463.2352
...
Each sheet has a different value for the [ID] column. Some sheets go up to TEST_E and some only have TEST_A, they are not always in the same order either.
I am trying to transform the dataframe by creating a new ID_TEST column that joins the ID and each TEST columns, and transforming the rest of the columns to align with ID_TEST. Every sheet will then have four columns as follows, where the new Result column will have the values from each individual TEST column.
[ID] [ID_TEST] [Date] [Result]
ID_1234 ID_1234-TEST_A 13/06/2017 11:00 1194.256258
ID_1234 ID_1234-TEST_A 13/06/2017 12:00 1194.266828
ID_1234 ID_1234-TEST_B 13/06/2017 11:00 1287.016744
ID_1234 ID_1234-TEST_B 13/06/2017 12:00 1287.1688
ID_1234 ID_1234-TEST_C 13/06/2017 11:00 1343.434
ID_1234 ID_1234-TEST_C 13/06/2017 12:00 1463.2352
I have tried starting with creating the [ID_TEST] column using this
df.apply(lambda col: col.name +" "+ col.astype(str) )
But this applies to every column and not the ones I want specifically. I think this will also need a pivot table or something to restructure the df but I am not sure how to implement it. Thanks.
Upvotes: 0
Views: 825
Reputation: 323266
Try with melt
out = df.melt(['ID','Date'],var_name = 'ID_TEST',value_name='Result')
out['ID_TEST'] = out['ID'] + '-' + out['ID_TEST']
Upvotes: 2