Reputation: 1132
The context is the initialization of optimization algorithms (for example, differential evolution), where we generate a random population of vectors that represent the solution's parameters.
We are trying to devise new initialization methods that are better than random. Regardless of the initialization method itself, in the end, we have M possible solutions from which we want only N (usually, they have more than 100 samples, and we want to select only 30).
In simple terms, the main idea is to find a set of dimensions N drawn from M that maximizes the distance between solutions while the overall fitness value is high (for our algorithm the lower the fitness value the better the performance of the solution).
What we are using right now is quite simple, we generate all the combinations of solution sets and measure their overall distance and fitness. From all of the points, we draw the Pareto frontier and select one of those points.
import math
import itertools
import numpy as np
import matplotlib.pyplot as plt
from paretoset import paretoset
def distance_matrix(a):
b = a.reshape(a.shape[0], 1, a.shape[1])
dist = np.sqrt(np.einsum('ijk, ijk->ij', a-b, a-b))
return dist
samples = np.random.random([10, 2])*10.0
fitness = [math.dist(p, [0,0]) for p in samples]
n_pop = 3
for s,f in zip(samples, fitness):
print(f'{s}({f})')
# additional code - get best n_pop points that are far away from each other
# compute distance matrix
dist = distance_matrix(samples)
print(f'{dist})')
# compute the cost (fitness + distance) of all solutions
aggregated_distances = []
aggregated_fitness = []
solutions = []
combs = itertools.combinations(range(len(samples)), n_pop)
print('main loop')
for c in combs:
# store current solution
solutions.append(c)
# compute the aggregated distance
agg_dist = 0.0
for i in c:
for j in c:
agg_dist += dist[i][j]
aggregated_distances.append(agg_dist)
# compute the aggregated fitness
agg_fit = 0.0
for s in c:
agg_fit += fitness[s]
aggregated_fitness.append(agg_fit)
aggregated_distances = np.array(aggregated_distances)
aggregated_fitness = np.array(aggregated_fitness)
solutions = np.array(solutions)
plt.plot(aggregated_distances, aggregated_fitness, 'o')
for s,d,f in zip(solutions, aggregated_distances, aggregated_fitness):
print(f'{s} {d} {f})')
#https://github.com/tommyod/paretoset
objective_values_array = np.vstack([aggregated_distances, aggregated_fitness]).T
print(f'{objective_values_array.shape}')
mask = paretoset(objective_values_array, sense=['max', 'min'])
print(f'{mask}')
print(solutions[mask])
plt.plot(aggregated_distances[mask], aggregated_fitness[mask], 'o')
plt.show()
The issue is that this code is quite slow when the population is reasonably large. Is there a better method? Ideally, something that is linear in complexity. Linear programming could be a solution?
Upvotes: 0
Views: 19