adamconkey
adamconkey

Reputation: 4745

Sparse random matrix in Python with different range than [0,1]

I need to generate a sparse random matrix in Python with all values in the range [-1,1] with uniform distribution. What is the most efficient way to do this?

I have a basic sparse random matrix:

from scipy import sparse
from numpy.random import RandomState

p = sparse.rand(10, 10, 0.1, random_state=RandomState(1))

And this gives me values in [0,1]:

print p
  (0, 0)    0.419194514403
  (0, 3)    0.0273875931979
  (1, 4)    0.558689828446
  (2, 7)    0.198101489085
  (3, 5)    0.140386938595
  (4, 1)    0.204452249732
  (4, 3)    0.670467510178
  (8, 1)    0.878117436391
  (9, 0)    0.685219500397
  (9, 3)    0.417304802367

It would be good to have an in-place solution or something that doesn't require blowing it up to a full matrix since in practice I will be using very large dimensions. It surprises me there are not some quick parameters to set for sparse.rand itself.

Upvotes: 6

Views: 1722

Answers (2)

hpaulj
hpaulj

Reputation: 231355

Since sparse.rand produces a coo matrix (as default) you could directly manipulate its .data attribute. ('csr' format could be transformed this way)

p=sparse.rand(10,10,0.1)
p.data *=2
p.data -=1

Before and after values would be:

  (0, 4)    0.758811389117
  (1, 8)    0.703514506105
  (1, 9)    0.640418745353
  (4, 0)    0.896198785835
  (4, 6)    0.511459880587
  (5, 2)    0.580048680358
  (7, 1)    0.739418689993
  (8, 3)    0.506395207688
  (8, 5)    0.900696518461
  (9, 4)    0.474014207942

  (0, 4)    0.517622778234
  (1, 8)    0.40702901221
  (1, 9)    0.280837490706
  (4, 0)    0.79239757167
  (4, 6)    0.0229197611736
  (5, 2)    0.160097360716
  (7, 1)    0.478837379986
  (8, 3)    0.0127904153758
  (8, 5)    0.801393036923
  (9, 4)    -0.051971584115

Same spatial density, just different value distribution.

In fact you could generate completely new .data values. The end of sparse.rand is:

....
j = .... # tweak random values
i = ...  # tweak ints
vals = random_state.rand(k).astype(dtype)
return coo_matrix((vals, (i, j)), shape=(m, n)).asformat(format)

The random array is generated from 3 random sequences, 2 producing integers in the right shape range, and the third producing the random values.

For example random values chosen from a list:

In [209]: p.data=np.random.choice(np.arange(20)-10,len(p.data))/10

In [210]: print(p.A)
[[ 0.   0.   0.   0.   0.9  0.   0.   0.   0.   0. ]
 [ 0.   0.   0.   0.   0.   0.   0.   0.  -0.1 -0.7]
 [ 0.   0.   0.   0.   0.   0.   0.   0.   0.   0. ]
 [ 0.   0.   0.   0.   0.   0.   0.   0.   0.   0. ]
 [-1.   0.   0.   0.   0.   0.  -0.8  0.   0.   0. ]
 [ 0.   0.   0.5  0.   0.   0.   0.   0.   0.   0. ]
 [ 0.   0.   0.   0.   0.   0.   0.   0.   0.   0. ]
 [ 0.   0.   0.   0.   0.   0.   0.   0.   0.   0. ]
 [ 0.   0.   0.   0.5  0.   0.4  0.   0.   0.   0. ]
 [ 0.   0.   0.   0.  -0.8  0.   0.   0.   0.   0. ]]

The development code just changes the 2nd to the last line to:

vals = data_rvs(k).astype(dtype)

where data_rvs is a parameter (or the default randomstate.rand).

Upvotes: 2

Eric Appelt
Eric Appelt

Reputation: 2973

Looks like the feature that you want was added about two months ago and will be available in scipy 0.16: https://github.com/scipy/scipy/blob/77af8f44bef43a67cb14c247bc230282022ed0c2/scipy/sparse/construct.py#L671

You will be able to call sparse.random(10, 10, 0.1, random_state=RandomState(1), data_fvs=func) where func "should take a single argument specifying the length of the ndarray that it will return. The structurally nonzero entries of the sparse random matrix will be taken from the array sampled by this function." So you will be able to provide an arbitrary distribution to sample from.

For now, you can at least stretch the uniform distribution to [0,N] by multiplying p by a scalar N:

>>> print 2*p

(0, 0)  0.838389028807
(9, 0)  1.37043900079
(4, 1)  0.408904499463
(8, 1)  1.75623487278
(0, 3)  0.0547751863959
(4, 3)  1.34093502036
(9, 3)  0.834609604734
(1, 4)  1.11737965689
(3, 5)  0.28077387719
(2, 7)  0.39620297817

You can't add scalars, but as a bit of a hack you can create a sparse matrix with all ones in the non-zero elements with p.ceil() since all elements of p were generated within [0,1]. Then to transform the uniform distribution to [-1,1] you can do

 print 2*p - p.ceil()

(0, 0)  -0.161610971193
(0, 3)  -0.945224813604
(1, 4)  0.117379656892
(2, 7)  -0.60379702183
(3, 5)  -0.71922612281
(4, 1)  -0.591095500537
(4, 3)  0.340935020357
(8, 1)  0.756234872782
(9, 0)  0.370439000794
(9, 3)  -0.165390395266

So in general if you need some interval [a,b] just perform:

p = (b - a)*p + a*p.ceil()

I can't see much of a better solution at present short of writing your own constructor similar to sparse.rand, but I would be curious to know if anyone at least knows a way to get around the ceil() hack.

Upvotes: 6

Related Questions