How to find the pareto-optimal solutions in a pandas dataframe

Question

I have a pandas dataframe with the name df_merged_population_current_iteration whose data you can download here as a csv file: https://easyupload.io/bdqso4

Now I want to create a new dataframe called pareto_df that contains all pareto-optimal solutions with regard to the minimization of the 2 objectives "Costs" and "Peak Load" from the dataframe df_merged_population_current_iteration. Further, it should make sure that no duplicate values are stored meaning that if a solution have identical values for the 2 objectives "Costs" and "Peak Load" it should only save one solution. Additionally, there is a check if the value for "Thermal Discomfort" is smaller than 2. If this is not the case, the solution will not be included in the new pareto_df.

For this purpose, I came up with the following code:

import pandas as pd

df_merged_population_current_iteration = pd.read_csv("C:/Users/wi9632/Desktop/sample_input.csv", sep=";")

# create a new DataFrame to store the Pareto-optimal solutions
pareto_df = pd.DataFrame(columns=df_merged_population_current_iteration.columns)

for i, row in df_merged_population_current_iteration.iterrows():
    is_dominated = False
    is_duplicate = False
    for j, other_row in df_merged_population_current_iteration.iterrows():
        if i == j:
            continue
        # Check if the other solution dominates the current solution
        if (other_row['Costs'] < row['Costs'] and other_row['Peak Load'] < row['Peak Load']) or \
                (other_row['Costs'] <= row['Costs'] and other_row['Peak Load'] < row['Peak Load']) or \
                (other_row['Costs'] < row['Costs'] and other_row['Peak Load'] <= row['Peak Load']):
            # The other solution dominates the current solution
            is_dominated = True
            break
        # Check if the other solution is a duplicate
        if (other_row['Costs'] == row['Costs'] and other_row['Peak Load'] == row['Peak Load']):
            is_duplicate = True
            break

    if not is_dominated and not is_duplicate and row['Thermal Discomfort'] < 2:
        # The current solution is Pareto-optimal, not a duplicate, and meets the discomfort threshold
        row_df = pd.DataFrame([row])
        pareto_df = pd.concat([pareto_df, row_df], ignore_index=True)

print(pareto_df)

In most cases, the code works fine. However, there are cases, in which no pareto-optimal solution is added to the new dataframe pareto_df , altough there exist pareto-optimal solutions that fulfill the criteria. This can be seen with the data I posted above. You can see that the solutions with the "id of the run" 7 and 8 are pareto-optimal (and fullfill the thermal discomfort constraint). However, the current code does not add any of those 2 to the new dataframe. It should add one of them (but not 2 as this would be a duplicate). I have to admit that I already tried a lot and had a closer look at the code, but I could not manage to find the mistake in my code.

Here is the actual output with the uploaded data:

Empty DataFrame
Columns: [Unnamed: 0, id of the run, Costs, Peak Load, Thermal Discomfort, Combined Score]
Index: []

And here is the desired output (one pareto-optimal solution):

Do you see what the mistake might be and how I have to adjust the code such that it in fact finds all pareto-optimal solutions without adding duplicates?

Reminder: Does anyone have any idea why the code does not find all pareto-optimal solutions? I'll highly appreciate any comments.

How to find the pareto-optimal solutions in a pandas dataframe

Answers (1)

Related Questions