Kristy
Kristy

Reputation: 261

How to speedup binary transformation from integer values

I wrote the following method (in python 2.7) that generates a set of integers and transform them into binary representation. It takes self-explanatory two parameters: total_num_nodes and dim. It returns numpy matrix-like containing the binary representation of all these integers:

def generate(total_num_nodes, dim):

    # Generate random nodes from the range (0, dim-1) 
    nodes_matrix = [random.randint(0, 2 ** dim - 1) for _ in range(total_num_nodes)]

    # Removes duplicates
    nodes_matrix = list(set(nodes_matrix))

    # Transforms each node from decimal to string representation
    nodes_matrix = [('{0:0' + str(dim) + 'b}').format(x) for x in nodes_matrix]

    # Transforms each bit into an integer.
    nodes_matrix = np.asarray([list(map(int, list(x))) for x in nodes_matrix], dtype=np.uint8)

    return nodes_matrix

The problem is that when I pass very large values, say total_num_nodes= 10,000,000 and dim=128, the generation time takes really long time. A friend of mine hinted me that the following line is actually a bottleneck and it is likely to be responsible for the majority of computation time:

# Transforms each node from decimal to string representation
nodes_matrix = [('{0:0' + str(dim) + 'b}').format(x) for x in nodes_matrix]

I cannot think of other faster method that can replce this line so that I get to speedup the generation time when it is running on a single processor. Any suggestion from you is really really appreciated.

Thank you

Upvotes: 0

Views: 39

Answers (1)

Mark Tolonen
Mark Tolonen

Reputation: 177725

Do it all in numpy and it will be faster.

The following generates total_num_nodes rows of dim columns of np.uint8 data, then keeps the unique rows by providing a view of the data suitable for np.unique, then translating back to a 2D array:

import numpy as np

def generate(total_num_nodes, dim):
    a = np.random.choice(np.array([0,1],dtype=np.uint8),size=(total_num_nodes,dim))
    dtype = a.dtype.descr * dim
    temp = a.view(dtype)
    uniq = np.unique(temp)
    return uniq.view(a.dtype).reshape(-1,dim)

Upvotes: 1

Related Questions