Reputation: 317
I am using Scikit-learn's non-negative matrix factorization (NMF) to perform NMF on a sparse matrix where the zero entries are missing data. I was wondering if the Scikit-learn's NMF implementation views zero entries as 0 or missing data.
Thank you!
Upvotes: 4
Views: 3358
Reputation: 86
NMF counts them as zeros. I figured it out using this code:
from scipy import sparse
from sklearn.decomposition import NMF
import numpy as np
mat = np.array([[1,1,1],
[1,1,0],
[1,0,0]], 'float32')
ix = np.nonzero(mat)
sparse_mat = sparse.csc_matrix((mat[ix], ix))
print('training matrix:')
print(sparse_mat.toarray())
model = NMF(n_components=1).fit(sparse_mat)
reconstructed = model.inverse_transform(model.transform(sparse_mat))
print('reconstructed:')
print(reconstructed)
The result:
training matrix:
[[1. 1. 1.]
[1. 1. 0.]
[1. 0. 0.]]
reconstructed:
[[1.22 0.98 0.54]
[0.98 0.78 0.44]
[0.54 0.44 0.24]]
Note that all of the none-zero elements are ones, so perfect reconstruction was possible by ignoring other elements, so considering this output, it's not the case.
Upvotes: 4
Reputation: 8312
In your data matrix the missing values can be 0, but rather than storing a bunch of zeros for a very sparse matrix you would usually store a COO matrix instead, where each row is stored in CSR format.
If you are using NMF for recommendations, then you would be factorising your data matrix X by finding W and H such that W.H approximately equals X with the condition that all three matrices are non-negative. When you reconstruct this matrix X some of the missing values (where you would have stored zeros) may become non-zero and some may remain zero. At this point, in the reconstructed matrix, the values are your predictions.
So to answer your question, are they 0's or missing data in the NMF model? The NMF model once fit will contain your predicted values, so I would count them as zero. This is a method of predicting missing values in the data.
Upvotes: 3