Reputation: 378
I'm programming a genetic algorithm in Python, however, my operator (MMX) takes too long (10 seconds) to execute for individuals with 3 million weights (each individual is a list of 3.000.000 elements).
This is the code for the operator:
def calc_gen(maxel, minel, rec1, rec2, phiC):
g = maxel - minel
phi = 0
if g > phiC:
# Recta 2
phi = rec2[0] * g + rec2[1]
elif g < phiC:
# Recta 1
phi = rec1[0] * g + rec1[1]
#Hay que asegurarse que no nos salimos del rango:
maxv = min(1, maxel - phi)
minv = max(0, minel + phi)
gen1 = random.uniform(minv, maxv) # Guardar el gen del primer hijo
# Si C es el centro y A el elemento que ya tenemos y B el simétrico de A: C - A + C = B -> 2C - A = B
# C = (maxv + minv) / 2; 2C - A = B -> maxv + minv - A = B
# center = (maxv + minv) / 2
gen2 = maxv + minv - gen1
return gen1, gen2
#return gen1, maxv + minv - gen1
def cxMMX(poblacion, rec1, rec2, phiC):
start = timer()
# Calcular el maximo y el minimo de cada gen en toda la población
max_genes = numpy.amax(poblacion, axis=0).tolist()
min_genes = numpy.amin(poblacion, axis=0).tolist()
gis = timer()
hijo1 = Individual()
hijo2 = Individual()
# Iterar dos listas a la vez (zip) con su indice (enumerate). Así crearemos los hijos simultáneamente en un loop
for i, (maxel, minel) in enumerate(zip(max_genes, min_genes)):
gen1, gen2 = calc_gen(maxel, minel, rec1, rec2, phiC)
hijo1.append(gen1)
hijo2.append(gen2)
end = timer()
#print("Tiempo Gi: %f Tiempo init: %f Tiempo calc gen: %f Tiempo mate total: %f" % (gis-start, init-gis, end-init, end-start))
return [hijo1, hijo2]
rec1, rec2 and phiC are parameters that determine how the crossover is done, you shouldn't bother about them. They have the same value all across the algorithm.
poblacion is a list of lists, lets say it's shape is [7,3000000]. Individual() is a custom class. It is basically inheriting "list" and adding some attributes to store the fitness value.
Doing numpy.amax and numpy.amin separately seems like doing extra work. Also, there's probably a more pythonic way to do the "calc_gen()" loop.
PD: "gen1" depends on "gen2": gen1 obtained randomly within a range, and then gen2 is obtained looking for the symmetrical point.
PD2: A more detailed explanation on MMX operator can be found on the original paper, however, you can assume the code is okay and does what it has to do. The doi is https://doi.org/10.1007/3-540-44522-6_73
PD: the enumerate() and the i are there from the old code, forgot to remove them!
EDIT: reduced 20% time with Dillon Davis's solution. It's a pretty clean solution which will work with any custom list building function, provided you obtain each value of the list by executing one function:
def calc_gen_v2(maxel,minel, rec1m, rec1b, rec2m, rec2b, phiC):
g = maxel - minel
phi = 0
if g > phiC:
# Recta 2
phi = rec2m * g + rec2b
elif g < phiC:
# Recta 1
phi = rec1m * g + rec1b
#Hay que asegurarse que no nos salimos del rango:
maxv = min(1, maxel - phi)
minv = max(0, minel + phi)
gen1 = random.uniform(minv, maxv) # Guardar el gen del primer hijo
# Si C es el centro y A el elemento que ya tenemos y B el simétrico de A: C - A + C = B -> 2C - A = B
# C = (maxv + minv) / 2; 2C - A = B -> maxv + minv - A = B
# center = (maxv + minv) / 2
gen2 = maxv + minv - gen1
return gen1, gen2
def cxMMX_v3(poblacion, rec1, rec2, phiC):
start = timer()
# Calcular el maximo y el minimo de cada gen en toda la población
max_genes = numpy.amax(poblacion, axis=0)
min_genes = numpy.amin(poblacion, axis=0)
gis = timer()
hijo1, hijo2 = map(Individual, numpy.vectorize(calc_gen_v2)(max_genes, min_genes, rec1[0], rec1[1], rec2[0], rec2[1], phiC))
end = timer()
#print("Tiempo Gi: %f Tiempo init: %f Tiempo calc gen: %f Tiempo mate total: %f" % (gis-start, init-gis, end-init, end-start))
return [hijo1, hijo2]
EDIT 2: as Dillon Davis suggested I implemented it in pure numpy, reducing the time to 3,5 seconds! (65% time save)
def cxMMX_numpy(poblacion, rec1, rec2, phiC):
# Calculate max and min for every gen in the population
max_genes = numpy.amax(poblacion, axis=0)
min_genes = numpy.amin(poblacion, axis=0)
g_pop = numpy.subtract(max_genes, min_genes)
phi_pop = numpy.where(g_pop < phiC, numpy.multiply(g_pop, rec1[0]) + rec1[1], numpy.where(g_pop > phiC, numpy.multiply(g_pop, rec2[0]) + rec2[1], 0))
maxv = numpy.minimum(numpy.subtract(max_genes, phi_pop), 1)
minv = numpy.maximum(numpy.sum([min_genes, phi_pop], axis=0), 0)
hijo1 = numpy.random.uniform(low=minv, high=maxv, size=minv.size)
hijo2 = numpy.subtract(numpy.sum([maxv, minv], axis=0), hijo1)
return [Individual(hijo1), Individual(hijo2)]
NOTE: In case you want to reuse, Individual inherits from list
NOTE: if g=phiC then rec1[0] * g_pop + rec1[1]=0, always, rec1[0] and rec1[1] guarantee that! so maybe it is better to do the math instead of a triple option?
Upvotes: 1
Views: 118
Reputation: 7750
Try replacing your for loop in cxMMX()
with something like:
hijo1, hijo2 = map(Individual, numpy.vectorize(calc_gen)(max_genes, min_genes, rec1, rec2, phiC))
and drop the .tolist()
from your numpy.amin()
and numpy.amax()
.
This will vectorize your your calc_gen function, avoid a zip and the function overhead from several .append()
calls, and should overall be quite a bit faster.
Edit:
Also consider converting calc_gen()
to work directly on the numpy arrays. Replace calls to random.uniform()
with numpy.random.uniform()
, min()
or max()
with numpy.minimum()
or numpy.maximum()
, and then eliminate the for loop / map + vectorize completely. This would ultimately be the fastest option.
Upvotes: 3
Reputation: 50205
Have you tried using a multiprocessing.Pool
?
You'd need to make a wrapper for calc_gen
first:
# after calc_gen def
def get_calc_gen(rec1, rec2, phiC):
return lambda maxel, minel: calc_gen(maxel, minel, rec1, rec2, phiC)
Then instead of the for
loop you'd do something like:
# replacing for loop section
cgen = get_calc_gen(rec1, rec2, phiC)
minmax_genes = zip(max_genes, min_genes)
pool = multiprocessing.Pool()
mapped_genes = pool.map(cgen, minmax_genes)
for gen1, gen2 in mapped_genes:
hijo1.append(gen1)
hijo2.append(gen2)
P.S. You don't need enumerate
in your original code since you don't seem to be using i
Upvotes: 1