Reputation: 165
I'm trying to vectorize my fitness function for a Minimum Vector Cover genetic algorithm, but I'm at a loss about how to do it.
As it stands now:
vert_cover_fitness = [1 if self.dna[edge[0]] or self.dna[edge[1]] else -num_edges for edge in edges]
The dna
is a one-dimensional binary array of size [0..n]
, where each index corresponds to a vertex, and its value indicates if we have chosen it or not. edges
is a two dimensional positive integer array, where each value corresponds to a vertex (index) in dna
. Both are ndarray
s.
Simply explained - if one of the vertices connected by an edge is "selected", then we get a score of one. If not, the function is penalized by -num_edges
.
I have tried np.vectorize
as an attempt to get away cheap with a lambda function:
fit_func = np.vectorize(lambda edge: 1 if self.dna[edge[0]] or self.dna[edge[1]] else -num_edges)
vert_cover_fitness = fit_func(edges)
This returns IndexError: invalid index to scalar variable.
, as this function is applied to each value, and not each row.
To fix this I tried np.apply_along_axis
. This works but it's just a wrapper for a loop so I'm not getting any speedups.
If any Numpy wizards can see some obvious way to do this, I would much appreciate your help. I'm guessing a problem lies with the representation of the problem, and that changing either the dna
or edges
shapes could help. I'm just not skilled enough to see what I should do.
Upvotes: 1
Views: 113
Reputation: 1450
I came up with this bit of numpy code, it runs 30x faster than your for loop on my randomly generated data.
import numpy as np
num_vertices = 1000
num_edges = 500
dna = np.random.choice([0, 1], num_vertices)
edges = np.random.randint(0, num_vertices, num_edges * 2).reshape(-1, 2)
vert_cover_fitness1 = [1 if dna[edge[0]] or dna[edge[1]] else -num_edges for edge in edges]
vert_cover_fitness2 = np.full([num_edges], -num_edges)
mask = (dna[edges[:, 0]] | dna[edges[:, 1]]).astype(bool)
vert_cover_fitness2[mask] = 1.0
print((vert_cover_fitness1 == vert_cover_fitness2).all()) # this shows it's correct
Here is the timeit code used to measure the speedup.
import timeit
setup = """
import numpy as np
num_vertices = 1000
num_edges = 500
dna = np.random.choice([0, 1], num_vertices)
edges = np.random.randint(0, num_vertices, num_edges*2).reshape(-1, 2)
"""
python_loop = "[1 if dna[edge[0]] or dna[edge[1]] else -num_edges for edge in edges]"
print(timeit.timeit(python_loop, setup, number=1000))
vectorised="""
vert_cover_fitness2 = np.full([num_edges], -num_edges)
mask = (dna[edges[:, 0]] | dna[edges[:, 1]]).astype(bool)
vert_cover_fitness2[mask] = 1.0
"""
print(timeit.timeit(vectorised, setup, number=1000))
# prints:
# 0.375906624016352
# 0.012783741112798452
Upvotes: 3