Reputation: 123
I am trying to create a Gale-Shapley algorithm in Python, that delivers stable matches of doctors and hospitals. To do so, I gave every doctor and every hospital a random preference represented by a number.
Dataframe consisting of preferences
Afterwards I created a function that rates every hospital for one specific doctor (represented by ID) followed by a ranking of this rating creating two new columns. In rating the match, I took the absolute value of the difference between the preferences, where a lower absolute value is a better match. This is the formula for the first doctor:
doctors_sorted_by_preference['Rating of Hospital by Doctor 1']=abs(doctors_sorted_by_preference['Preference Doctor'].iloc[0]-doctors_sorted_by_preference['Preference Hospital'])
doctors_sorted_by_preference['Rank of Hospital by Doctor 1']=doctors_sorted_by_preference["Rating of Hospital by Doctor 1"].rank()
which leads to the following table: Dataframe consisting of preferences and rating + ranking of doctor
Hence, doctor 1 prefers the first hospital over all other hospitals as represented by the ranking.
Now I want to repeat this function for every different doctor by creating a loop (creating two new columns for every doctor and adding them to my dataframe), but I don't know how to do this. I could type out the same function for all the 10 different doctors, but if I increase the dataset to include 1000 doctors and hospitals this would become impossible, there must be a better way... This would be the same function for doctor 2:
doctors_sorted_by_preference['Rating of Hospital by Doctor 2']=abs(doctors_sorted_by_preference['Preference Doctor'].iloc[1]-doctors_sorted_by_preference['Preference Hospital'])
doctors_sorted_by_preference['Rank of Hospital by Doctor 2']=doctors_sorted_by_preference["Rating of Hospital by Doctor 2"].rank()
Thank you in advance!
Upvotes: 3
Views: 111
Reputation: 507
You can also append the values into list and then write it to dataframe. Appending into lists would be faster if you have a large dataset.
I named by dataframe as df
for sake of viewing :
for i in range(len(df['Preference Doctor'])):
list1= []
for j in df['Preference Hospital']:
list1.append(abs(df['Preference Doctor'].iloc[i]-j))
df['Rating of Hospital by Doctor_' +str(i+1)] = pd.DataFrame(list1)
df['Rank of Hospital by Doctor_' +str(i+1)] = df['Rating of Hospital by Doctor_'
+str(i+1)].rank()
Upvotes: 2