Reputation: 696
I am reading the same binary file into Python and Matlab and putting it into a matrix. When I take the norm of this matrix, I get different results.
I am using the respective smatload function to load the binary file.
Python:
def smatload(filename):
#print 'opening: ', filename
f = open(filename, 'rb')
m = np.fromfile(f,'q',1)
n = np.fromfile(f,'q',1)
nnz = np.fromfile(f,'q',1)
print 'reading %d x %d with %d non-zeros' % (m,n,nnz)
S = np.fromfile(f,'d',3*nnz)
f.close()
S = S.reshape((nnz,3))
rows = S[:,0].astype(int) - 1
cols = S[:,1].astype(int) - 1
vals = S[:,2]
return csr_matrix((vals,(rows,cols)),shape=(m,n))
Matlab:
function [A] = smatload(filename)
fid = fopen(filename,'r');
if( fid == -1 )
disp(sprintf('Error: Unable to open file [%s], fid=%d\n',filename,fid));
A = [-1];
fclose(fid);
return;
end
m = fread(fid,[1 1],'uint64');
n = fread(fid,[1 1],'uint64');
nnz = fread(fid,[1 1],'uint64');
fprintf('Reading %d x %d with %d non-zeros\n',m,n,nnz);
S = fread(fid,[3 nnz],'double');
fclose(fid);
A = sparse(S(1,:)',S(2,:)',S(3,:)',m,n);
The results that I get for the norms of the returned matrices are
Matlab: norm(A.'fro') = 0.018317077159881
Python: np.linalg.norm(A) = 0.018317077159760
I have confirmed that they are both reading the correct number of values (6590x7126 matrix, 122526 non-zeros), and that I am using the same norm for both (frobenius).
Any ideas as to what might cause this?
Upvotes: 1
Views: 1029
Reputation: 528
Well Matlab definitely seems to have different implementation for sparse and for dense arrays. Using the 4425x7126 sparse matrix A with 54882 non zero entries that you linked to and the following commands :
FA=full(A);
av=A(:);
fav=FA(:);
I would expect the following commands to all yield the same value, because they are all computing the square root of the sum of the squares of the (non zero) elements of A :
norm(A,'fro')
norm(av,2)
norm(FA,'fro')
norm(fav,2)
sqrt( sum(av .* av) )
sqrt( sum(av .^ 2) )
sqrt( sum(fav .* fav) )
sqrt( sum(fav .^ 2) )
In fact we see three slightly different answers :
norm(A,'fro') 0.0223294051001499
norm(av,2) 0.0223294051001499
norm(FA,'fro') 0.0223294051001499
norm(fav,2) 0.0223294051001499
sqrt( sum(av .* av) ) 0.0223294051001521
sqrt( sum(av .^ 2) ) 0.0223294051001521
sqrt( sum(fav .* fav) ) 0.0223294051001506
sqrt( sum(fav .^ 2) ) 0.0223294051001506
In fact, even the reported sum of the elements of the sparse and dense representations of A are (a little bit) different :
sum(A(:)) 1.00000000000068
sum(FA(:)) 1.00000000000035
These differences seem to be of the same order of magnitude that you were seeing between the Python and Matlab norms.
Upvotes: 2
Reputation: 528
Not an answer but I don't have enough rep to comment. Would it be worthwhile to try to narrow the problem down a bit? If you partitioned your original matrix into 4 sub-matrices (Top left, top right, bottom left, bottom right) and compared the Frobenius norm reported in Matlab and in Python for each sub-matrix do you still see a discrepency between any values? If yes, then rinse and repeat on that sub-matix. If no, then don't even waste your time reading this comment. :-)
Upvotes: 1
Reputation: 307
A quick look at the Frobenius Norm shows it requires squaring all of the values and adding them together.
Since you have uint64 in the read command, it looks like you could be filling up the floating point storage. When you multiply two binary numbers, it takes twice as many bits to store the answer. This means you would need 128 bits to store all the decimal values. If Python and MATLAB are doing this differently, that could explain why your decimal values are different.
See these two links for information about how MATLAB and Python handle floating point accuracy:
Python: https://docs.python.org/2/tutorial/floatingpoint.html
MATLAB: http://blogs.mathworks.com/cleve/2014/07/07/floating-point-numbers/
Upvotes: 4