P-M
P-M

Reputation: 1399

Plot heatmap of sparse matrix

I have a large sparse matrix containing a histogram which I would like to plot as heatmap. Normally I would simply plot the full matrix (h) as follows:

import matplotlib.pyplot as plt
plt.imshow(h.T, interpolation="nearest", origin="lower")
plt.colorbar()
plt.savefig("corr.eps")

In this case I however have the problem that the full matrix would have the dimensions of 189,940x189,940 which is too large for me to hold in memory. I have found posts on plotting the sparsity pattern (e.g. python matplotlib plot sparse matrix pattern ) but nothing on how to plot the heatmap yet without converting it into a dense matrix. Is it possible to do so? (Or is there some other way of plotting it without running out of RAM?) My sparse matrix is currently a lilmatrix (scipy.sparse.lil_matrix).

Upvotes: 3

Views: 6136

Answers (2)

Adam
Adam

Reputation: 17339

Paul's approach is what matspy uses to make spy plots. Visually it looks like this:

matspy triple product

Matspy only cares about the sparsity pattern and not the values, but we can use its internal helper method that creates those left and right matrices:

data  # a scipy matrix
binned_shape = tuple(int(x / 3) for x in data.shape)  # example: shrink by a third

from matspy.adapters.scipy_impl import generate_spy_triple_product_coo

left, right = generate_spy_triple_product_coo(data.shape, binned_shape)

result = left @ data @ right
result = result.todense()

Upvotes: 0

Paul Panzer
Paul Panzer

Reputation: 53029

One idea would be to downsample using sparse operations.

 data = data.tocsc()       # sparse operations are more efficient on csc
 N, M = data.shape
 s, t = 400, 400           # decimation factors for y and x directions
 T = sparse.csc_matrix((np.ones((M,)), np.arange(M), np.r_[np.arange(0, M, t), M]), (M, (M-1) // t + 1))
 S = sparse.csr_matrix((np.ones((N,)), np.arange(N), np.r_[np.arange(0, N, s), N]), ((N-1) // s + 1, N))
 result = S @ data @ T     # downsample by binning into s x t rectangles
 result = result.todense() # ready for plotting

This code snippet implements a simple binning, but could be refined to incorporate more sophisticated filters. The binning matrices are just binned id matrices, for example S_ij = 1 if j // s = i else 0.

Some more explanation. Since the original matrix is very large there is scope to downsample it, without any visually noticable difference in the output.

The question is how to downsample without creating a dense representation first. One possible answer is to express the binning in terms of matrix multiplication and then use sparse matrix multiplication.

So, if multiplying your original data from the right with a binning matrix T then the columns of T correspond to the column bins, in particular the number of columns of T will determine how many pixels the downsampled data will have in x direction. Each column of T determines what goes into the corresponding bin and what not. In the example I set a number of elements encoding adjacent columns (of the original matrix) to 1 and the rest to 0. This takes these columns sums across them and puts the sum in the result matrix, in other words it bins these columns together.

Multiplying from the left works in exactly the same way, only it affects rows, not columns.

If you feel that binning is too crude you can replace the simple zero one scheme for example with a smooth kernel, just make sure that the resulting matrix remains sparse. Setting up such a matrix requires a bit more effort, but is not difficult. You are using a sparse matrix for your data, so I assume you are familiar with how to construct a sparse matrix.

Upvotes: 2

Related Questions