Performance Problems, Clustering Using Affinity Matrix, Eigenvalues

I am trying to use spectral clustering on an image. I first calculate the affinity matrix, then attempt to get the eigenvectors. However, on a 7056x7056 matrix eig() call is taking too long. Any suggestions on how to improve this? Perhaps I should use a different form of affinity?

import matplotlib.pyplot as plt
import numpy as np

Img = plt.imread("twoObj.bmp")
Img2 = Img.flatten()
(n,) = Img2.shape
A = np.subtract.outer(Img2, Img2)
V,D = np.linalg.eig(A)

Upvotes: 5

Answers (4)

andy_a

Reputation: 141

According to a paper by eminent mathematician and LAPACK contributor I.S. Dhillon, spectral clustering problems can be transformed to what are called kernel k-means problems. This can reduce the computation by a factor of 1000. They have implemented the algorithm in a free software release at the University of Texas website. I haven't tried it yet but it looks like the real thing. Certainly the overhead of a SYSTEM() call is nothing compared to a large eigenvector computation.

Upvotes: 1

doug

Reputation: 70038

The linalg module from scipy.sparse has three functions which you often help in situations like this (even if your matrix is not sparse). In sum, the solution techniques that power these functions are better suited for computation with much larger matrices (i.e., these functions wrap different underlying fortran routines, ARPACK, SEEUPD, among them.)

Here's another reason to look at the analogous functions in scipy.sparse. A substantial amount of computational effort is spared if the algoritm isn't forced to look for all eigenvectors/eigenvalues (which you almost never need, and certainly don't need for your particular use). The eigenvalue functions in scipy.sparse.linalg give you explicit control over this. In particular, the eigs function in scipy.sparse.linalg accepts a parameter "k" which is the number of of eigenvalues/eigenvalues you want.

Upvotes: 2

Joe Kington

Reputation: 284602

One quick and easy optimzation is to use np.linalg.eigh. (And np.linalg.eigvalsh if you just want the eigenvalues.)

Because you have a symmetric matrix (assuming you take the absolute value), you can "tell" numpy to use a more efficient algorithm this way.

import numpy as np
x = np.random.random(1000)
A = np.subtract.outer(x, x)
A = np.abs(A)
w, v = np.linalg.eigh(A)

Comparing timings, eigh takes ~5.3 seconds while eig takes ~23.4 seconds.

The performance of np.linalg.eig, etc is going to be strongly dependent on which libraries numpy is linked to. Using a heavily optimized blas library (e.g. ATLAS or Intel's MKL) can have a very dramatic difference, especially in this case.

Also, depending on how numpy is built, (e.g. whether or not a fortran compiler was availabe) scipy.linalg.eigh etc may be faster. There's also a chance that scipy and numpy may be linked against different blas libraries, though this is rather unlikely.

Upvotes: 4

eat

Reputation: 7530

First of all, based on how you constructed your matrix A. It will be an anti-symmetric (aka skew-symmetric) matrix, and it's rank will be (very likely) 2.

Perhaps you should take only the eigenvectors corresponding to the two largest eigenvalues. However it's likely that the eigenvalues are complex.

Anyway it might be that working with svd(singular value decomposition) will actually be more straightforward one.

Please feel free to elaborate more on what you are aiming for.

Upvotes: 2

Performance Problems, Clustering Using Affinity Matrix, Eigenvalues

Answers (4)

Related Questions