Reputation: 35
I have a very big 500,000 X 500,000 but sparse matrix. I want to find its norm using Python, I tried using:
%timeit numpy.linalg.norm(a, axis=1)
where a
is the matrix
I also tried this:
b = numpy.asarray(a, numpy.float32)
numpy.linalg.norm(b, axis=1)
But the Google Colabalways crashes.
I also tried doing this:
import numpy as np
a = np.random.rand(1000000,100)
print(np.linalg.norm(a, axis =1).shape)
def g(d, out=None):
bs = 2000
if out is None:
r = np.empty(d.shape[0])
else:
r = out
for i in range(0, d.shape[0], bs):
u = min(i + bs, d.shape[0])
r[i:u] = np.linalg.norm(d[i:u], axis=1)
return r
print((g(a) == numpy.linalg.norm(a, axis =1)).all())
print("blocked")
%timeit -n 10 g(a)
print("normal")
%timeit -n 10 numpy.linalg.norm(a, axis =1)
from one of the previous stack overflow questions, but still the kernel crashes, Is there a way I can do that?
Upvotes: 0
Views: 85
Reputation: 231615
If I make a modest sized sparse matrix:
In [109]: M = sparse.random(10,100,.2, 'csr'); M
Out[109]:
<10x100 sparse matrix of type '<class 'numpy.float64'>'
with 200 stored elements in Compressed Sparse Row format>
I can't use np.linalg.norm
on it:
In [110]: np.linalg.norm(M,axis=1)
---------------------------------------------------------------------------
AxisError Traceback (most recent call last)
Cell In[110], line 1
----> 1 np.linalg.norm(M,axis=1)
File <__array_function__ internals>:200, in norm(*args, **kwargs)
File ~\miniconda3\lib\site-packages\numpy\linalg\linalg.py:2542, in norm(x, ord, axis, keepdims)
2539 elif ord is None or ord == 2:
2540 # special case for speedup
2541 s = (x.conj() * x).real
-> 2542 return sqrt(add.reduce(s, axis=axis, keepdims=keepdims))
2543 # None of the str-type keywords for ord ('fro', 'nuc')
2544 # are valid for vectors
2545 elif isinstance(ord, str):
AxisError: axis 1 is out of bounds for array of dimension 0
But I can do it on its dense equivalent. M.A
will produce a memory error is M
dimensions are large.
In [111]: np.linalg.norm(M.A,axis=1)
Out[111]:
array([2.08644827, 2.61130439, 2.69501798, 2.33149559, 2.53580021,
3.35499087, 2.14149997, 1.45864564, 2.56890956, 2.76315379])
sparse
has a norm that works on M
:
In [112]: sparse.linalg.norm(M,axis=1)
Out[112]:
array([2.08644827, 2.61130439, 2.69501798, 2.33149559, 2.53580021,
3.35499087, 2.14149997, 1.45864564, 2.56890956, 2.76315379])
For this L2
norm I can do the sparse calculation directly:
In [113]: M.power(2).sum(axis=1)
Out[113]:
matrix([[ 4.35326638],
[ 6.81891059],
[ 7.2631219 ],
[ 5.43587169],
[ 6.43028271],
[11.25596371],
[ 4.58602212],
[ 2.1276471 ],
[ 6.59929632],
[ 7.63501886]])
This sum produces a (1,10) dense matrix. Using A1
to make a 1d array, I get the same numbers:
In [114]: M.power(2).sum(axis=1).A1**.5
Out[114]:
array([2.08644827, 2.61130439, 2.69501798, 2.33149559, 2.53580021,
3.35499087, 2.14149997, 1.45864564, 2.56890956, 2.76315379])
However it's done we need a tempory array/matrix that's just as large to hold the square values. The sum reduces the dimension etc.
I don't need the A1
step if I make a sparse_array
(which sparse
is moving towards slowly).
In [115]: M = sparse.random(10,100,.2, 'csr'); M = sparse.csr_array(M);M
Out[115]:
<10x100 sparse array of type '<class 'numpy.float64'>'
with 200 stored elements in Compressed Sparse Row format>
In [116]: sparse.linalg.norm(M,axis=1)
Out[116]:
array([2.21844541, 2.3164772 , 2.56784187, 2.69002853, 2.57405933,
3.44123739, 2.41489642, 1.87393488, 2.08952551, 2.93634053])
In [117]: M.power(2).sum(1)
Out[117]:
array([ 4.92150002, 5.36606664, 6.59381185, 7.23625348, 6.62578144,
11.84211477, 5.83172473, 3.51163195, 4.36611684, 8.62209571])
Upvotes: 0