Jeff
Jeff

Reputation: 123

I want to create new dataframe columns looping over rows of a specific column

I am trying to create a Gale-Shapley algorithm in Python, that delivers stable matches of doctors and hospitals. To do so, I gave every doctor and every hospital a random preference represented by a number.

Dataframe consisting of preferences

enter image description here

Afterwards I created a function that rates every hospital for one specific doctor (represented by ID) followed by a ranking of this rating creating two new columns. In rating the match, I took the absolute value of the difference between the preferences, where a lower absolute value is a better match. This is the formula for the first doctor:

  doctors_sorted_by_preference['Rating of Hospital by Doctor 1']=abs(doctors_sorted_by_preference['Preference Doctor'].iloc[0]-doctors_sorted_by_preference['Preference Hospital'])
    doctors_sorted_by_preference['Rank of Hospital by Doctor 1']=doctors_sorted_by_preference["Rating of Hospital by Doctor 1"].rank()

which leads to the following table: Dataframe consisting of preferences and rating + ranking of doctor

enter image description here

Hence, doctor 1 prefers the first hospital over all other hospitals as represented by the ranking.

Now I want to repeat this function for every different doctor by creating a loop (creating two new columns for every doctor and adding them to my dataframe), but I don't know how to do this. I could type out the same function for all the 10 different doctors, but if I increase the dataset to include 1000 doctors and hospitals this would become impossible, there must be a better way... This would be the same function for doctor 2:

doctors_sorted_by_preference['Rating of Hospital by Doctor 2']=abs(doctors_sorted_by_preference['Preference Doctor'].iloc[1]-doctors_sorted_by_preference['Preference Hospital'])
    doctors_sorted_by_preference['Rank of Hospital by Doctor 2']=doctors_sorted_by_preference["Rating of Hospital by Doctor 2"].rank()

Thank you in advance!

Upvotes: 3

Views: 111

Answers (1)

Sumanth
Sumanth

Reputation: 507

You can also append the values into list and then write it to dataframe. Appending into lists would be faster if you have a large dataset.

I named by dataframe as df for sake of viewing :

for i in range(len(df['Preference Doctor'])):
    list1= []
    for j in df['Preference Hospital']:
         list1.append(abs(df['Preference Doctor'].iloc[i]-j))
    df['Rating of Hospital by Doctor_' +str(i+1)] = pd.DataFrame(list1)
    df['Rank of Hospital by Doctor_' +str(i+1)] = df['Rating of Hospital by Doctor_' 
                                                         +str(i+1)].rank()

Upvotes: 2

Related Questions