guanglei
guanglei

Reputation: 176

Python: how to use Python to generate a random sparse symmetric matrix?

How to use python to generate a random sparse symmetric matrix ?

In MATLAB, we have a function "sprandsym (size, density)"

But how to do that in Python?

Upvotes: 2

Views: 4959

Answers (3)

unutbu
unutbu

Reputation: 879561

If you have scipy, you could use sparse.random. The sprandsym function below generates a sparse random matrix X, takes its upper triangular half, and adds its transpose to itself to form a symmetric matrix. Since this doubles the diagonal values, the diagonals are subtracted once.

The non-zero values are normally distributed with mean 0 and standard deviation of 1. The Kolomogorov-Smirnov test is used to check that the non-zero values is consistent with a drawing from a normal distribution, and a histogram and QQ-plot is generated too to visualize the distribution.

import numpy as np
import scipy.stats as stats
import scipy.sparse as sparse
import matplotlib.pyplot as plt
np.random.seed((3,14159))

def sprandsym(n, density):
    rvs = stats.norm().rvs
    X = sparse.random(n, n, density=density, data_rvs=rvs)
    upper_X = sparse.triu(X) 
    result = upper_X + upper_X.T - sparse.diags(X.diagonal())
    return result

M = sprandsym(5000, 0.01)
print(repr(M))
# <5000x5000 sparse matrix of type '<class 'numpy.float64'>'
#   with 249909 stored elements in Compressed Sparse Row format>

# check that the matrix is symmetric. The difference should have no non-zero elements
assert (M - M.T).nnz == 0

statistic, pval = stats.kstest(M.data, 'norm')
# The null hypothesis is that M.data was drawn from a normal distribution.
# A small p-value (say, below 0.05) would indicate reason to reject the null hypothesis.
# Since `pval` below is > 0.05, kstest gives no reason to reject the hypothesis
# that M.data is normally distributed.
print(statistic, pval)
# 0.0015998040114 0.544538788914

fig, ax = plt.subplots(nrows=2)
ax[0].hist(M.data, normed=True, bins=50)
stats.probplot(M.data, dist='norm', plot=ax[1])
plt.show()

enter image description here


PS. I used

upper_X = sparse.triu(X) 
result = upper_X + upper_X.T - sparse.diags(X.diagonal())

instead of

 result = (X + X.T)/2.0

because I could not convince myself that the non-zero elements in (X + X.T)/2.0 have the right distribution. First, if X were dense and normally distributed with mean 0 and variance 1, i.e. N(0, 1), then (X + X.T)/2.0 would be N(0, 1/2). Certainly we could fix this by using

 result = (X + X.T)/sqrt(2.0)

instead. Then result would be N(0, 1). But there is yet another problem: If X is sparse, then at nonzero locations, X + X.T would often be a normally distributed random variable plus zero. Dividing by sqrt(2.0) will squash the normal distribution closer to 0 giving you a more tightly spiked distribution. As X becomes sparser, this may be less and less like a normal distribution.

Since I didn't know what distribution (X + X.T)/sqrt(2.0) generates, I opted for copying the upper triangular half of X (thus repeating what I know to be normally distributed non-zero values).

Upvotes: 6

will
will

Reputation: 10650

The matrix needs to be symmetric too, which seems to be glossed over by the two answers here;

def sparseSym(rank, density=0.01, format='coo', dtype=None, random_state=None):
  density = density / (2.0 - 1.0/rank)
  A = scipy.sparse.rand(rank, rank, density=density, format=format, dtype=dtype, random_state=random_state)
  return (A + A.transpose())/2

This will create a sparse matrix, and then adds it's transpose to itself to make it symmetric.

It takes into account the fact that the density will increase as you add the two together, and the fact that there is no additional increase in density from the diagonal terms.

Upvotes: 3

en_Knight
en_Knight

Reputation: 5381

unutbu's answer is the best one for performance and extensibility - numpy and scipy, together, have a lot of the functionality from matlab.

If you can't use them for whatever reason, or you're looking for a pure python solution, you could try

from random import randgauss, randint
sparse = [ [0 for i in range(N)] for j in range(N)]
# alternatively, if you have numpy but not scipy:
# sparse = numpy.zeros(N,N)
for _ in range(num_terms):
    (i,j) = (randint(0,n),randint(0,n))
    x = randgauss(0,1)
    sparse[i][j] = x
    sparse[j][i] = x

Although it might give you a little more control than unutbu's solution, you should expect it to be significantly slower; scipy is a dependency you probably don't want to avoid

Upvotes: 1

Related Questions