Reputation: 4318
I was wondering if there's any way to have a scipy.sparse.csc_matrix
format for mlpy
in python
. I have worked with mlpy before and have always dealt with non sparse matrices. For instance if I have 5 features and 1 label (0 or 1) for each row I'd have something like this:
2,3,4,5,6,0
1,2,3,4,5,1
.....
Now for my next project, I have a huge number of features around 20,000 so creating a sparse matrix in this case would be much easier.
I looked at mlpy documentation for k-means clustering (since all I have to do now is to cluster data) and it says:
Parameters :
x : 2d array_like object (N, P)
data
k : int (1<k<N)
number of clusters
plus : bool
k-means++ algorithm for initialization
seed : int
random seed for initialization
Returns :
clusters, means, steps: 1d array, 2d array, int
cluster membership in 0,...,K-1, means (K,P), number of steps
I think by this they mean that mlpy accepts only non-sparse matrices. If I am reading something wrong, please let me know.
Any help would be highly appreciated. Thanks!
Upvotes: 0
Views: 345
Reputation: 2476
I think that the answer is simply that the kmeans in MLPy does not work with sparse inputs. It is non-trivial to code an algorithm to work on sparse inputs.
The MiniBatchKMeans of scikit-learn works on sparse input (disclaimer: I am a scikit-learn developer).
Upvotes: 1