Reputation: 261
I wrote the following method (in python 2.7) that generates a set of integers and transform them into binary representation. It takes self-explanatory two parameters: total_num_nodes
and dim
. It returns numpy matrix-like containing the binary representation of all these integers:
def generate(total_num_nodes, dim):
# Generate random nodes from the range (0, dim-1)
nodes_matrix = [random.randint(0, 2 ** dim - 1) for _ in range(total_num_nodes)]
# Removes duplicates
nodes_matrix = list(set(nodes_matrix))
# Transforms each node from decimal to string representation
nodes_matrix = [('{0:0' + str(dim) + 'b}').format(x) for x in nodes_matrix]
# Transforms each bit into an integer.
nodes_matrix = np.asarray([list(map(int, list(x))) for x in nodes_matrix], dtype=np.uint8)
return nodes_matrix
The problem is that when I pass very large values, say total_num_nodes= 10,000,000
and dim=128
, the generation time takes really long time. A friend of mine hinted me that the following line is actually a bottleneck and it is likely to be responsible for the majority of computation time:
# Transforms each node from decimal to string representation
nodes_matrix = [('{0:0' + str(dim) + 'b}').format(x) for x in nodes_matrix]
I cannot think of other faster method that can replce this line so that I get to speedup the generation time when it is running on a single processor. Any suggestion from you is really really appreciated.
Thank you
Upvotes: 0
Views: 39
Reputation: 177725
Do it all in numpy and it will be faster.
The following generates total_num_nodes
rows of dim
columns of np.uint8
data, then keeps the unique rows by providing a view of the data suitable for np.unique
, then translating back to a 2D array:
import numpy as np
def generate(total_num_nodes, dim):
a = np.random.choice(np.array([0,1],dtype=np.uint8),size=(total_num_nodes,dim))
dtype = a.dtype.descr * dim
temp = a.view(dtype)
uniq = np.unique(temp)
return uniq.view(a.dtype).reshape(-1,dim)
Upvotes: 1