formidable93
formidable93

Reputation: 45

Fast calculation of eigenvector centrality takes too long in networkx

I am using networkx to calculate eigenvector centrality. The problem is that it takes way too long (already running for about 6 hours). Is there a faster way to get the results?

There are around 200,000 nodes in the graph and 60,000,000 edges.

Upvotes: 3

Views: 2269

Answers (2)

seralouk
seralouk

Reputation: 33147

By looking at the source code, networkx.algorithms.centrality.eigenvector uses the power method to find the leading eigenvector.

If you stick to networkx use this as Joel noticed:

eigenvector_centrality_numpy

centrality = nx.eigenvector_centrality_numpy(G)

Alternatively:

You can use scipy.sparse.linalg.eigs that uses the ARPACK and request only 1 eigenvector to be returned.

Toy example:

import scipy.sparse as sparse

X = np.array() # dimensions 200000 by 200000 as the adjacency
# Note: k=1 and you request the Largest real.
vals, vecs = sparse.linalg.eigs(X, k=1, which='LR')

In any case, 2000000 by 200000 is big and depending on the sparsity and the nature of the matrix, the algorithm may take long. You will also need a lot of CPU and RAM.

Additional tip for networkx.algorithms.centrality.eigenvector:

If you stick with networkx try to relax the tolerance:

eigenvector_centrality(G, max_iter=100, tol=1e-06, nstart=None, weight=None)

Try setting tol=1e-04 or even tol=1e-03

Upvotes: 2

Joel
Joel

Reputation: 23827

Try using eigenvector_centrality_numpy. From the documentation:

This algorithm uses the SciPy sparse eigenvalue solver (ARPACK) to find the largest eigenvalue/eigenvector pair.

So this will do serafeim's calculation, with a tiny bit of additional processing.

Upvotes: 1

Related Questions