David
David

Reputation: 696

Matlab and Python Reading Binary File Differently

I am reading the same binary file into Python and Matlab and putting it into a matrix. When I take the norm of this matrix, I get different results.

I am using the respective smatload function to load the binary file.

Python:

def smatload(filename):
    #print 'opening: ', filename
    f = open(filename, 'rb')
    m = np.fromfile(f,'q',1)
    n = np.fromfile(f,'q',1)
    nnz = np.fromfile(f,'q',1)
    print 'reading %d x %d with %d non-zeros' % (m,n,nnz)
    S = np.fromfile(f,'d',3*nnz)
    f.close()
    S = S.reshape((nnz,3))
    rows = S[:,0].astype(int) - 1
    cols = S[:,1].astype(int) - 1
    vals = S[:,2]
    return csr_matrix((vals,(rows,cols)),shape=(m,n))

Matlab:

function [A] = smatload(filename)

fid = fopen(filename,'r');
if( fid == -1 )
    disp(sprintf('Error: Unable to open file [%s], fid=%d\n',filename,fid));
    A = [-1];
    fclose(fid);
    return;
end

m   = fread(fid,[1 1],'uint64');
n   = fread(fid,[1 1],'uint64');
nnz = fread(fid,[1 1],'uint64');

fprintf('Reading %d x %d with %d non-zeros\n',m,n,nnz);

S = fread(fid,[3 nnz],'double');
fclose(fid);
A = sparse(S(1,:)',S(2,:)',S(3,:)',m,n);

The results that I get for the norms of the returned matrices are

Matlab: norm(A.'fro') = 0.018317077159881

Python: np.linalg.norm(A) = 0.018317077159760

I have confirmed that they are both reading the correct number of values (6590x7126 matrix, 122526 non-zeros), and that I am using the same norm for both (frobenius).

Any ideas as to what might cause this?

Upvotes: 1

Views: 1029

Answers (3)

Wilf Rosenbaum
Wilf Rosenbaum

Reputation: 528

Well Matlab definitely seems to have different implementation for sparse and for dense arrays. Using the 4425x7126 sparse matrix A with 54882 non zero entries that you linked to and the following commands :

FA=full(A);
av=A(:);
fav=FA(:);

I would expect the following commands to all yield the same value, because they are all computing the square root of the sum of the squares of the (non zero) elements of A :

norm(A,'fro')
norm(av,2)
norm(FA,'fro')
norm(fav,2)

sqrt( sum(av .* av) )
sqrt( sum(av .^ 2) )

sqrt( sum(fav .* fav) )
sqrt( sum(fav .^ 2) )

In fact we see three slightly different answers :

 norm(A,'fro')             0.0223294051001499
 norm(av,2)                0.0223294051001499
 norm(FA,'fro')            0.0223294051001499
 norm(fav,2)               0.0223294051001499

 sqrt( sum(av .* av) )     0.0223294051001521
 sqrt( sum(av .^ 2) )      0.0223294051001521

 sqrt( sum(fav .* fav) )   0.0223294051001506
 sqrt( sum(fav .^ 2) )     0.0223294051001506

In fact, even the reported sum of the elements of the sparse and dense representations of A are (a little bit) different :

sum(A(:))                 1.00000000000068
sum(FA(:))                1.00000000000035

These differences seem to be of the same order of magnitude that you were seeing between the Python and Matlab norms.

Upvotes: 2

Wilf Rosenbaum
Wilf Rosenbaum

Reputation: 528

Not an answer but I don't have enough rep to comment. Would it be worthwhile to try to narrow the problem down a bit? If you partitioned your original matrix into 4 sub-matrices (Top left, top right, bottom left, bottom right) and compared the Frobenius norm reported in Matlab and in Python for each sub-matrix do you still see a discrepency between any values? If yes, then rinse and repeat on that sub-matix. If no, then don't even waste your time reading this comment. :-)

Upvotes: 1

jeremy
jeremy

Reputation: 307

A quick look at the Frobenius Norm shows it requires squaring all of the values and adding them together.

Since you have uint64 in the read command, it looks like you could be filling up the floating point storage. When you multiply two binary numbers, it takes twice as many bits to store the answer. This means you would need 128 bits to store all the decimal values. If Python and MATLAB are doing this differently, that could explain why your decimal values are different.

See these two links for information about how MATLAB and Python handle floating point accuracy:

Python: https://docs.python.org/2/tutorial/floatingpoint.html

MATLAB: http://blogs.mathworks.com/cleve/2014/07/07/floating-point-numbers/

Upvotes: 4

Related Questions