Reputation: 4745
I need to generate a sparse random matrix in Python with all values in the range [-1,1]
with uniform distribution. What is the most efficient way to do this?
I have a basic sparse random matrix:
from scipy import sparse
from numpy.random import RandomState
p = sparse.rand(10, 10, 0.1, random_state=RandomState(1))
And this gives me values in [0,1]
:
print p
(0, 0) 0.419194514403
(0, 3) 0.0273875931979
(1, 4) 0.558689828446
(2, 7) 0.198101489085
(3, 5) 0.140386938595
(4, 1) 0.204452249732
(4, 3) 0.670467510178
(8, 1) 0.878117436391
(9, 0) 0.685219500397
(9, 3) 0.417304802367
It would be good to have an in-place solution or something that doesn't require blowing it up to a full matrix since in practice I will be using very large dimensions. It surprises me there are not some quick parameters to set for sparse.rand
itself.
Upvotes: 6
Views: 1722
Reputation: 231355
Since sparse.rand
produces a coo
matrix (as default) you could directly manipulate its .data
attribute. ('csr' format could be transformed this way)
p=sparse.rand(10,10,0.1)
p.data *=2
p.data -=1
Before and after values would be:
(0, 4) 0.758811389117
(1, 8) 0.703514506105
(1, 9) 0.640418745353
(4, 0) 0.896198785835
(4, 6) 0.511459880587
(5, 2) 0.580048680358
(7, 1) 0.739418689993
(8, 3) 0.506395207688
(8, 5) 0.900696518461
(9, 4) 0.474014207942
(0, 4) 0.517622778234
(1, 8) 0.40702901221
(1, 9) 0.280837490706
(4, 0) 0.79239757167
(4, 6) 0.0229197611736
(5, 2) 0.160097360716
(7, 1) 0.478837379986
(8, 3) 0.0127904153758
(8, 5) 0.801393036923
(9, 4) -0.051971584115
Same spatial density, just different value distribution.
In fact you could generate completely new .data
values. The end of sparse.rand
is:
....
j = .... # tweak random values
i = ... # tweak ints
vals = random_state.rand(k).astype(dtype)
return coo_matrix((vals, (i, j)), shape=(m, n)).asformat(format)
The random array is generated from 3 random sequences, 2 producing integers in the right shape range, and the third producing the random values.
For example random values chosen from a list:
In [209]: p.data=np.random.choice(np.arange(20)-10,len(p.data))/10
In [210]: print(p.A)
[[ 0. 0. 0. 0. 0.9 0. 0. 0. 0. 0. ]
[ 0. 0. 0. 0. 0. 0. 0. 0. -0.1 -0.7]
[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. ]
[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. ]
[-1. 0. 0. 0. 0. 0. -0.8 0. 0. 0. ]
[ 0. 0. 0.5 0. 0. 0. 0. 0. 0. 0. ]
[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. ]
[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. ]
[ 0. 0. 0. 0.5 0. 0.4 0. 0. 0. 0. ]
[ 0. 0. 0. 0. -0.8 0. 0. 0. 0. 0. ]]
The development code just changes the 2nd to the last line to:
vals = data_rvs(k).astype(dtype)
where data_rvs
is a parameter (or the default randomstate.rand
).
Upvotes: 2
Reputation: 2973
Looks like the feature that you want was added about two months ago and will be available in scipy 0.16: https://github.com/scipy/scipy/blob/77af8f44bef43a67cb14c247bc230282022ed0c2/scipy/sparse/construct.py#L671
You will be able to call sparse.random(10, 10, 0.1, random_state=RandomState(1), data_fvs=func)
where func
"should take a single argument specifying the length of the ndarray that it will return. The structurally nonzero entries of the sparse random matrix will be taken from the array sampled by this function." So you will be able to provide an arbitrary distribution to sample from.
For now, you can at least stretch the uniform distribution to [0,N] by multiplying p by a scalar N:
>>> print 2*p
(0, 0) 0.838389028807
(9, 0) 1.37043900079
(4, 1) 0.408904499463
(8, 1) 1.75623487278
(0, 3) 0.0547751863959
(4, 3) 1.34093502036
(9, 3) 0.834609604734
(1, 4) 1.11737965689
(3, 5) 0.28077387719
(2, 7) 0.39620297817
You can't add scalars, but as a bit of a hack you can create a sparse matrix with all ones in the non-zero elements with p.ceil()
since all elements of p were generated within [0,1]. Then to transform the uniform distribution to [-1,1] you can do
print 2*p - p.ceil()
(0, 0) -0.161610971193
(0, 3) -0.945224813604
(1, 4) 0.117379656892
(2, 7) -0.60379702183
(3, 5) -0.71922612281
(4, 1) -0.591095500537
(4, 3) 0.340935020357
(8, 1) 0.756234872782
(9, 0) 0.370439000794
(9, 3) -0.165390395266
So in general if you need some interval [a,b] just perform:
p = (b - a)*p + a*p.ceil()
I can't see much of a better solution at present short of writing your own constructor similar to sparse.rand
, but I would be curious to know if anyone at least knows a way to get around the ceil()
hack.
Upvotes: 6