Reputation: 355
I have two N x N co-occurrence matrices (484x484 and 1060x1060) that I have to analyze. The matrices are symmetrical along the diagonal and contain lots of zero values. The non-zero values are integers.
I want to group together the positions that are non-zero. In other words, what I want to do is the algorithm on this link. When order by cluster is selected, the matrix gets re-arranged in rows and columns to group the non-zero values together.
Since I am using Python for this task, I looked into SciPy Sparse Linear Algebra library, but couldn't find what I am looking for.
Any help is much appreciated. Thanks in advance.
Upvotes: 6
Views: 4955
Reputation: 77454
Because your data is sparse, treat it as a graph, not a matrix.
Then try the various graph clustering methods. For example cliques are interesting on such data.
Note that not everything may cluster.
Upvotes: 4
Reputation: 285
If you have a matrix dist
with pairwise distances between objects, then you can find the order on which to rearrange the matrix by applying a clustering algorithm on this matrix (http://scikit-learn.org/stable/modules/clustering.html). For example it might be something like:
from sklearn import cluster
import numpy as np
model = cluster.AgglomerativeClustering(n_clusters=20,affinity="precomputed").fit(dist)
new_order = np.argsort(model.labels_)
ordered_dist = dist[new_order] # can be your original matrix instead of dist[]
ordered_dist = ordered_dist[:,new_order]
The order is given by the variable model.labels_
, which has the number of the cluster to which each sample belongs. A few observations:
affinity="precomputed"
option to tell it that we are using pre-computed distances).dist=1 - data/data.max()
)Upvotes: 7