Mats
Mats

Reputation: 165

How to vectorize a function with array lookups

I'm trying to vectorize my fitness function for a Minimum Vector Cover genetic algorithm, but I'm at a loss about how to do it.

As it stands now:

vert_cover_fitness = [1 if self.dna[edge[0]] or self.dna[edge[1]] else -num_edges for edge in edges]

The dna is a one-dimensional binary array of size [0..n], where each index corresponds to a vertex, and its value indicates if we have chosen it or not. edges is a two dimensional positive integer array, where each value corresponds to a vertex (index) in dna. Both are ndarrays.

Simply explained - if one of the vertices connected by an edge is "selected", then we get a score of one. If not, the function is penalized by -num_edges.

I have tried np.vectorize as an attempt to get away cheap with a lambda function:

fit_func = np.vectorize(lambda edge: 1 if self.dna[edge[0]] or self.dna[edge[1]] else -num_edges)
vert_cover_fitness = fit_func(edges)

This returns IndexError: invalid index to scalar variable., as this function is applied to each value, and not each row.

To fix this I tried np.apply_along_axis. This works but it's just a wrapper for a loop so I'm not getting any speedups.

If any Numpy wizards can see some obvious way to do this, I would much appreciate your help. I'm guessing a problem lies with the representation of the problem, and that changing either the dna or edges shapes could help. I'm just not skilled enough to see what I should do.

Upvotes: 1

Views: 113

Answers (1)

Addy
Addy

Reputation: 1450

I came up with this bit of numpy code, it runs 30x faster than your for loop on my randomly generated data.

import numpy as np
num_vertices = 1000
num_edges = 500
dna = np.random.choice([0, 1], num_vertices)
edges = np.random.randint(0, num_vertices, num_edges * 2).reshape(-1, 2)

vert_cover_fitness1 = [1 if dna[edge[0]] or dna[edge[1]] else -num_edges for edge in edges]

vert_cover_fitness2 = np.full([num_edges], -num_edges)
mask = (dna[edges[:, 0]] | dna[edges[:, 1]]).astype(bool)
vert_cover_fitness2[mask] = 1.0

print((vert_cover_fitness1 == vert_cover_fitness2).all()) # this shows it's correct

Here is the timeit code used to measure the speedup.

import timeit

setup = """
import numpy as np
num_vertices = 1000
num_edges = 500
dna = np.random.choice([0, 1], num_vertices)
edges = np.random.randint(0, num_vertices, num_edges*2).reshape(-1, 2)
"""

python_loop = "[1 if dna[edge[0]] or dna[edge[1]] else -num_edges for edge in edges]"

print(timeit.timeit(python_loop, setup, number=1000))

vectorised="""
vert_cover_fitness2 = np.full([num_edges], -num_edges)
mask = (dna[edges[:, 0]] | dna[edges[:, 1]]).astype(bool)
vert_cover_fitness2[mask] = 1.0
"""

print(timeit.timeit(vectorised, setup, number=1000))

# prints:
# 0.375906624016352
# 0.012783741112798452

Upvotes: 3

Related Questions