Reputation: 1343
I have a large sparse matrix, in the scipy lil_matrix format the size is 281903x281903, it is an adjacency matrix https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.lil_matrix.html
I need a reliable way to get N indexes that are zero. I can't just draw all zero indexes and then choose random ones, since that makes my computer run out of memory. Are there a way of identifying N random indexes without having to trawl through the entire data-structure?
I currently get 10% of the non zero indices the following way (Y is my sparse matrix):
percent = 0.1
oneIdx = Y.nonzero()
numberOfOnes = len(oneIdx[0])
maskLength = int(math.floor(numberOfOnes * percent))
idxOne = np.array(random.sample(range(0,numberOfOnes), maskLength))
maskOne = tuple(np.asarray(oneIdx)[:,idxOne])
I am looking for way to get a mask with the same length as the non zero mask, but with zeros...
Upvotes: 0
Views: 521
Reputation: 11602
Here is an approach based on rejection sampling. Based on the numbers in your example, an index chosen uniformly at random is likely to be zero, so this will be a relatively efficient approach.
from scipy import sparse
dims = (281903, 281903)
mat = sparse.lil_matrix(dims, dtype=np.int)
for _ in range(1000):
x, y = np.random.randint(0, dims[0], 2)
mat[x, y] = 1
def sample_zero_forever(mat):
nonzero_or_sampled = set(zip(*mat.nonzero()))
while True:
t = tuple(np.random.randint(0, mat.shape[0], 2))
if t not in nonzero_or_sampled:
yield t
nonzero_or_sampled.add(t)
def sample_zero_n(mat, n=100):
itr = sample_zero_forever(mat)
return [next(itr) for _ in range(n)]
Upvotes: 2