Reut
Reut

Reputation: 1592

Pareto front for matplotlib scatter plot

I'm trying to add pareto front to scatter plot I have. The scatter plot data is:

array([[1.44100000e+04, 3.31808987e+07],
       [1.21250000e+04, 3.22901074e+07],
       [6.03000000e+03, 2.84933900e+07],
       [8.32500000e+03, 2.83091317e+07],
       [6.68000000e+03, 2.56373373e+07],
       [5.33500000e+03, 1.89331461e+07],
       [3.87500000e+03, 1.84107940e+07],
       [3.12500000e+03, 1.60416570e+07],
       [6.18000000e+03, 1.48054565e+07],
       [4.62500000e+03, 1.33395341e+07],
       [5.22500000e+03, 1.23150492e+07],
       [3.14500000e+03, 1.20244820e+07],
       [6.79500000e+03, 1.19525083e+07],
       [2.92000000e+03, 9.18176770e+06],
       [5.45000000e+02, 5.66882578e+06]])

and the the scatter plot looks like this:

enter image description here

I have used this tutorial in order to plot the pareto, but for some reason the result is very weird and I get tiny red line :

enter image description here

This is the code I have used:

def identify_pareto(scores):
    # Count number of items
    population_size = scores.shape[0]

    # Create a NumPy index for scores on the pareto front (zero indexed)
    population_ids = np.arange(population_size)

    # Create a starting list of items on the Pareto front
    # All items start off as being labelled as on the Parteo front
    pareto_front = np.ones(population_size, dtype=bool)
    print(pareto_front)
    # Loop through each item. This will then be compared with all other items
    for i in range(population_size):
        
        # Loop through all other items
        for j in range(population_size):
            
            # Check if our 'i' pint is dominated by out 'j' point
            if all(scores[j] >= scores[i]) and any(scores[j] > scores[i]):
               
                # j dominates i. Label 'i' point as not on Pareto front
                pareto_front[i] = 0
                # Stop further comparisons with 'i' (no more comparisons needed)
                break
    # Return ids of scenarios on pareto front
    return population_ids[pareto_front]


pareto = identify_pareto(scores)

pareto_front_df = pd.DataFrame(pareto_front)
pareto_front_df.sort_values(0, inplace=True)
pareto_front = pareto_front_df.values

#here I get as output weird results:
>>>
array([[ 5, 81],
       [15, 80],
       [30, 79],
       [55, 77],
       [70, 65],
       [80, 60],
       [90, 40],
       [97, 23],
       [99,  4]])

x_all = scores[:, 0]
y_all = scores[:, 1]
x_pareto = pareto_front[:, 0]
y_pareto = pareto_front[:, 1]

plt.scatter(x_all, y_all)
plt.plot(x_pareto, y_pareto, color='r')
plt.xlabel('Objective A')
plt.ylabel('Objective B')
plt.show()

the result is the tiny red line.

My question is, where is my mistake? how can I get back the pareto line?

Upvotes: 1

Views: 1920

Answers (1)

dufrmbgr
dufrmbgr

Reputation: 407

I don't think there is anything wrong in your code but rather the way your data is represented by scores (If scores is the first array you presented).

The first element of the array [1.44100000e+04, 3.31808987e+07] is really large as compared to other values and hence it's the only outer iteration inside the function where if all(scores[j] >= scores[i]) and any(scores[j] > scores[i]): condition is not met and not reduced to zero. All other points are reduced to zero.

I believe this is the only point plotted as red dot.

Upvotes: 1

Related Questions