Galaxy
Galaxy

Reputation: 2012

How to plot anndata like matrix to grey 2d numpy array

Input data

Output


I want to plot values of mtx.todense()[1:2,:] with x,y = cspdata2d[index] to outimg, where index is from the 2nd dimensions of mtx.

It is like scanpy.pl.spatial, but I want to plot to pixels. And scanpy is slow on huge data set.

Upvotes: 2

Views: 64

Answers (1)

Jérôme Richard
Jérôme Richard

Reputation: 50826

Here is a relatively-efficient Numpy solution:

import numpy as np
from scipy.sparse.coo import coo_matrix

# Compute `mtx.toarray()[1:2,:]`
# Time: 0.76 s
selected_row = 1
select = mtx.row == selected_row
assert np.array_equiv(mtx.row[select], selected_row)
slice_data = mtx.data[select]
slice_rows = np.zeros(np.count_nonzero(select))
slice_cols = mtx.col[select]
mtx_sparse_slice = coo_matrix((slice_data, (slice_rows, slice_cols)), shape=(1,mtx.shape[1]))
mtx_dense_slice = mtx_sparse_slice.toarray()

# Compute `x,y = cspdata2d[index]` where index is from the 2nd dimensions of mtx
# Assume cspdata2d contains [x,y] values in each row and not [y,x] ones
# Assume the zeros values are not interesting so they are not included
# Time: 0.42 s
index = mtx_dense_slice[0,:]
nnz_index = np.nonzero(index)[0]
x = cspdata2d[nnz_index,0]
y = cspdata2d[nnz_index,1]

# Compute outimg
# Time: 0.75 s
outimg = np.zeros(shape=cspdata2d.max(axis=0), dtype=np.int16) 
outimg[x, y] = nnz_index.astype(np.int16)

Here is a setup to test this with random values:

mtx_nnz = 357_558_594
mtx_nrows = 30_562
mtx_ncols = 96_629_557
mtx_data = np.random.randint(0, mtx_ncols, size=mtx_nnz).astype(np.int64) # Apparently values <= mtx_ncols
mtx_rows = np.random.randint(0, mtx_nrows, size=mtx_nnz).astype(np.int32)
mtx_cols = np.random.randint(0, mtx_ncols, size=mtx_nnz).astype(np.int32)
mtx = coo_matrix((mtx_data, (mtx_rows, mtx_cols)), shape=(mtx_nrows, mtx_ncols))
del mtx_data, mtx_rows, mtx_ncols # Save some memory space

cspdata2d = np.empty(shape=(mtx_ncols,2), dtype=np.int32)
cspdata2d[:,0] = np.random.randint(0, 51_968, size=mtx_ncols)
cspdata2d[:,1] = np.random.randint(0, 40_141, size=mtx_ncols)

The overall execution time is about 2 seconds on my machine which is relatively fast (though not optimal) for a (sequential) Numpy code considering the significant size of the input data (>=10 GiB data to read/write).

Upvotes: 0

Related Questions