Reputation: 842
I have a pandas dataframe with the name df_merged_population_current_iteration
whose data you can download here as a csv file: https://easyupload.io/bdqso4
Now I want to create a new dataframe called pareto_df
that contains all pareto-optimal solutions with regard to the minimization of the 2 objectives "Costs" and "Peak Load" from the dataframe df_merged_population_current_iteration
. Further, it should make sure that no duplicate values are stored meaning that if a solution have identical values for the 2 objectives "Costs" and "Peak Load" it should only save one solution. Additionally, there is a check if the value for "Thermal Discomfort" is smaller than 2. If this is not the case, the solution will not be included in the new pareto_df
.
For this purpose, I came up with the following code:
import pandas as pd
df_merged_population_current_iteration = pd.read_csv("C:/Users/wi9632/Desktop/sample_input.csv", sep=";")
# create a new DataFrame to store the Pareto-optimal solutions
pareto_df = pd.DataFrame(columns=df_merged_population_current_iteration.columns)
for i, row in df_merged_population_current_iteration.iterrows():
is_dominated = False
is_duplicate = False
for j, other_row in df_merged_population_current_iteration.iterrows():
if i == j:
continue
# Check if the other solution dominates the current solution
if (other_row['Costs'] < row['Costs'] and other_row['Peak Load'] < row['Peak Load']) or \
(other_row['Costs'] <= row['Costs'] and other_row['Peak Load'] < row['Peak Load']) or \
(other_row['Costs'] < row['Costs'] and other_row['Peak Load'] <= row['Peak Load']):
# The other solution dominates the current solution
is_dominated = True
break
# Check if the other solution is a duplicate
if (other_row['Costs'] == row['Costs'] and other_row['Peak Load'] == row['Peak Load']):
is_duplicate = True
break
if not is_dominated and not is_duplicate and row['Thermal Discomfort'] < 2:
# The current solution is Pareto-optimal, not a duplicate, and meets the discomfort threshold
row_df = pd.DataFrame([row])
pareto_df = pd.concat([pareto_df, row_df], ignore_index=True)
print(pareto_df)
In most cases, the code works fine. However, there are cases, in which no pareto-optimal solution is added to the new dataframe pareto_df
, altough there exist pareto-optimal solutions that fulfill the criteria. This can be seen with the data I posted above. You can see that the solutions with the "id of the run" 7 and 8 are pareto-optimal (and fullfill the thermal discomfort constraint). However, the current code does not add any of those 2 to the new dataframe. It should add one of them (but not 2 as this would be a duplicate). I have to admit that I already tried a lot and had a closer look at the code, but I could not manage to find the mistake in my code.
Here is the actual output with the uploaded data:
Empty DataFrame
Columns: [Unnamed: 0, id of the run, Costs, Peak Load, Thermal Discomfort, Combined Score]
Index: []
And here is the desired output (one pareto-optimal solution):
Do you see what the mistake might be and how I have to adjust the code such that it in fact finds all pareto-optimal solutions without adding duplicates?
Reminder: Does anyone have any idea why the code does not find all pareto-optimal solutions? I'll highly appreciate any comments.
Upvotes: 1
Views: 590
Reputation: 3633
Condition for testing dominance should be written more strictly. The culprit seems to be the last if
clause where you are checking both non dominance and duplicacy.
Your old code has bug which will add a row to output(pareto_df
) DataFrame only when it is non dominated and also not duplicate simultaneously. This condition will not work if you have duplicate rows in your input DataFrame. If two rows are duplicate, we should add one of them as they are non dominated w.r.t each other. Old code is not doing it properly and hence the empty DataFrame.
You should remember that only if a point remains undominated we will add it to pareto dataframe. Duplicacy in output will be handled through drop_duplicates
.
df_merged_population_current_iteration = pd.read_csv("C:/Users/wi9632/Desktop/sample_input.csv", sep=";")
# create a new DataFrame to store the Pareto-optimal solutions
pareto_df = pd.DataFrame(columns=df_merged_population_current_iteration.columns)
for i, row in df_merged_population_current_iteration.iterrows():
is_dominated = False
is_duplicate = False
for j, other_row in df_merged_population_current_iteration.iterrows():
if i == j:
continue
# Check if the other solution dominates the current solution
if (other_row['Costs'] < row['Costs'] and other_row['Peak Load'] < row['Peak Load']):
# The other solution dominates the current solution and hence row cannot be added to pareto set.
is_dominated = True
break
# Check if the other solution is a duplicate
if (other_row['Costs'] == row['Costs'] and other_row['Peak Load'] == row['Peak Load']):
is_duplicate = True
break
if not is_dominated and row['Thermal Discomfort'] < 2:
# The current solution is Pareto-optimal, and meets the discomfort threshold
row_df = pd.DataFrame([row])
pareto_df = pd.concat([pareto_df, row_df], ignore_index=True).drop_duplicates()
print(pareto_df)
Upvotes: 0