Another Random Guy
Another Random Guy

Reputation: 35

Getting norm of a big sparse matrix

I have a very big 500,000 X 500,000 but sparse matrix. I want to find its norm using Python, I tried using:

%timeit numpy.linalg.norm(a, axis=1)

where a is the matrix

I also tried this:

b = numpy.asarray(a, numpy.float32)
numpy.linalg.norm(b, axis=1)

But the Google Colabalways crashes.

I also tried doing this:

import numpy as np
a =  np.random.rand(1000000,100)
print(np.linalg.norm(a, axis =1).shape)

def g(d, out=None):
    bs = 2000
    if out is None:
        r = np.empty(d.shape[0])
    else:
        r = out
    for i in range(0, d.shape[0], bs):
        u = min(i + bs, d.shape[0])
        r[i:u] = np.linalg.norm(d[i:u], axis=1)
    return r


print((g(a) == numpy.linalg.norm(a, axis =1)).all())
print("blocked")
%timeit -n 10 g(a)
print("normal")
%timeit -n 10 numpy.linalg.norm(a, axis =1) 

from one of the previous stack overflow questions, but still the kernel crashes, Is there a way I can do that?

Upvotes: 0

Views: 85

Answers (1)

hpaulj
hpaulj

Reputation: 231615

If I make a modest sized sparse matrix:

In [109]: M = sparse.random(10,100,.2, 'csr'); M
Out[109]: 
<10x100 sparse matrix of type '<class 'numpy.float64'>'
    with 200 stored elements in Compressed Sparse Row format>

I can't use np.linalg.norm on it:

In [110]: np.linalg.norm(M,axis=1)
---------------------------------------------------------------------------
AxisError                                 Traceback (most recent call last)
Cell In[110], line 1
----> 1 np.linalg.norm(M,axis=1)

File <__array_function__ internals>:200, in norm(*args, **kwargs)

File ~\miniconda3\lib\site-packages\numpy\linalg\linalg.py:2542, in norm(x, ord, axis, keepdims)
   2539 elif ord is None or ord == 2:
   2540     # special case for speedup
   2541     s = (x.conj() * x).real
-> 2542     return sqrt(add.reduce(s, axis=axis, keepdims=keepdims))
   2543 # None of the str-type keywords for ord ('fro', 'nuc')
   2544 # are valid for vectors
   2545 elif isinstance(ord, str):

AxisError: axis 1 is out of bounds for array of dimension 0

But I can do it on its dense equivalent. M.A will produce a memory error is M dimensions are large.

In [111]: np.linalg.norm(M.A,axis=1)
Out[111]: 
array([2.08644827, 2.61130439, 2.69501798, 2.33149559, 2.53580021,
       3.35499087, 2.14149997, 1.45864564, 2.56890956, 2.76315379])

sparse has a norm that works on M:

In [112]: sparse.linalg.norm(M,axis=1)
Out[112]: 
array([2.08644827, 2.61130439, 2.69501798, 2.33149559, 2.53580021,
       3.35499087, 2.14149997, 1.45864564, 2.56890956, 2.76315379])

For this L2 norm I can do the sparse calculation directly:

In [113]: M.power(2).sum(axis=1)
Out[113]: 
matrix([[ 4.35326638],
        [ 6.81891059],
        [ 7.2631219 ],
        [ 5.43587169],
        [ 6.43028271],
        [11.25596371],
        [ 4.58602212],
        [ 2.1276471 ],
        [ 6.59929632],
        [ 7.63501886]])

This sum produces a (1,10) dense matrix. Using A1 to make a 1d array, I get the same numbers:

In [114]: M.power(2).sum(axis=1).A1**.5
Out[114]: 
array([2.08644827, 2.61130439, 2.69501798, 2.33149559, 2.53580021,
       3.35499087, 2.14149997, 1.45864564, 2.56890956, 2.76315379])

However it's done we need a tempory array/matrix that's just as large to hold the square values. The sum reduces the dimension etc.

sparse array

I don't need the A1 step if I make a sparse_array (which sparse is moving towards slowly).

In [115]: M = sparse.random(10,100,.2, 'csr'); M = sparse.csr_array(M);M
Out[115]: 
<10x100 sparse array of type '<class 'numpy.float64'>'
    with 200 stored elements in Compressed Sparse Row format>

In [116]: sparse.linalg.norm(M,axis=1)
Out[116]: 
array([2.21844541, 2.3164772 , 2.56784187, 2.69002853, 2.57405933,
       3.44123739, 2.41489642, 1.87393488, 2.08952551, 2.93634053])

In [117]: M.power(2).sum(1)
Out[117]: 
array([ 4.92150002,  5.36606664,  6.59381185,  7.23625348,  6.62578144,
       11.84211477,  5.83172473,  3.51163195,  4.36611684,  8.62209571])

Upvotes: 0

Related Questions