Reputation: 854
I have a dataframe :
Drug_Name CurrentYear PMPM Cost
Drug A 201901 25 10
Drug B 201902 25 20
Drug C 201903 50 30
Drug D 202001 75 25
Drug E 202002 100 100
And i want to convert this into :
Drug_Name Current Year ComparisionYear Measure_Name Measure_Value_Current Measure_Value_Comparision
Drug A 201901 201901 PMPM 25 25
Drug A 201901 201901 Cost 10 10
Drug B 201902 201901 PMPM 25 25
Drug B 201902 201901 Cost 20 10
Drug C 201903 201901 PMPM 50 25
Drug C 201903 201901 Cost 30 10
Drug C 201903 201902 PMPM 50 25
Drug C 201903 201902 Cost 30 20
Drug D 202001 201901 PMPM 75 25
Drug D 202001 201901 Cost 25 10
Drug D 202001 201902 PMPM 75 25
Drug D 202001 201902 Cost 25 20
Drug D 202001 201903 PMPM 75 50
Drug D 202001 201903 Cost 25 30
Drug E 202002 201901 PMPM 100 25
Drug E 202002 201901 Cost 100 10
Drug E 202002 201902 PMPM 100 25
Drug E 202002 201902 Cost 100 20
Drug E 202002 201903 PMPM 100 50
Drug E 202002 201903 Cost 100 30
Drug E 202002 202001 PMPM 100 75
Drug E 202002 202001 Cost 100 25
The idea is not just to pivot but also to add these derived columns for every possible combinations
Upvotes: 0
Views: 25
Reputation: 2579
So we are going to do a 3 step process.
-- Get all the combo of drugs with all information --
We are going to get all the combo of drugs then save it in a df called allCombo
import itertools
import pandas as pd
allCombo = pd.DataFrame(list(itertools.product(df['Drug_Name'], repeat = 2)), columns = ['Drug1','Drug2'])
Next we are going to make the dtype of the orignial df's drugs a catagory, and ensure that they are ordered. This will allow us to have a significantly easier time filtering when we have to.
We are also going to melt your original df to put PMPM and Cost on rows as you have it in your final df.
df['Drug_Name'] = df['Drug_Name'].astype(pd.api.types.CategoricalDtype(df['Drug_Name'].to_list(), ordered = True))
df= df.melt(['Drug_Name','CurrentYear'], var_name = 'Measure_Name', value_name = 'Measure_Value')
Next we are going to merge our melted original df two times with our allCombo df. After this we have all the information of both drugs we want to compare on each row of the df.
merged = allCombo.merge(df, left_on = 'Drug1', right_on = 'Drug_Name').merge(df, left_on ='Drug2', right_on='Drug_Name')
-- Filter it down to what we are looking for --
Next, we will take our merged df and filter it to only display the rows that we want. You will see here how we used the ordered catagory to help us out here.
filtered = merged[(merged['Drug_Name_x'] < merged['Drug_Name_y'])& (merged['Measure_Name_x']==merged['Measure_Name_y']) |
((merged['Drug_Name_x'] == 'Drug A') &(merged['Drug_Name_y'] == 'Drug A') & (merged['Measure_Name_x']==merged['Measure_Name_y']))].copy()
After the or statement in this filter is specifically due to the fact you want only Drug A compared with Drug A, but not any of the other drugs compared to themselves. We now have the df you are looking for.
-- Clean the df (ie col names, reset the index ect) --
Next is simple dropping of columns, and renaming of columns to match what you wanted.
filtered.drop(columns = ['Drug_Name_x', 'Measure_Name_x'], inplace = True)
filtered.rename(columns = {'Drug_Name_y':'Drug_Name',
'CurrentYear_y':'Current Year',
'CurrentYear_x':'ComparisionYear',
'Measure_Name_y':'Measure_Name',
'Measure_Value_y':'Measure_Value_Current',
'Measure_Value_x':'Measure_Value_Comparision'
}, inplace = True)
final = filtered[['Drug_Name', 'Current Year', 'ComparisionYear', 'Measure_Name', 'Measure_Value_Current', 'Measure_Value_Comparision']].reset_index(drop = True)
And now you have your final df!
Upvotes: 1