Reputation: 45
I am using networkx to calculate eigenvector centrality. The problem is that it takes way too long (already running for about 6 hours). Is there a faster way to get the results?
There are around 200,000 nodes in the graph and 60,000,000 edges.
Upvotes: 3
Views: 2269
Reputation: 33147
By looking at the source code, networkx.algorithms.centrality.eigenvector
uses the power method to find the leading eigenvector.
If you stick to networkx
use this as Joel noticed:
centrality = nx.eigenvector_centrality_numpy(G)
Alternatively:
You can use scipy.sparse.linalg.eigs
that uses the ARPACK and request only 1 eigenvector to be returned.
Toy example:
import scipy.sparse as sparse
X = np.array() # dimensions 200000 by 200000 as the adjacency
# Note: k=1 and you request the Largest real.
vals, vecs = sparse.linalg.eigs(X, k=1, which='LR')
In any case, 2000000 by 200000 is big and depending on the sparsity and the nature of the matrix, the algorithm may take long. You will also need a lot of CPU and RAM.
Additional tip for networkx.algorithms.centrality.eigenvector
:
If you stick with networkx try to relax the tolerance:
eigenvector_centrality(G, max_iter=100, tol=1e-06, nstart=None, weight=None)
Try setting tol=1e-04
or even tol=1e-03
Upvotes: 2
Reputation: 23827
Try using eigenvector_centrality_numpy. From the documentation:
This algorithm uses the SciPy sparse eigenvalue solver (ARPACK) to find the largest eigenvalue/eigenvector pair.
So this will do serafeim's calculation, with a tiny bit of additional processing.
Upvotes: 1