Why should I reverse the arc of a Networkx DiGraph before calculating closeness centrality?

Question

I have noted this strange behaviour in the Networkx library, which (at least to me) is somehow unexpected. In the example below I create a directed graph of 4 nodes, where each node is only connected to its successor. I would expect that node 4 (which can be reached from 1, 2 and 3) should have the highest centrality, but I get something different.

pd=nx.DiGraph()
pd.add_edges_from([(0,1),(1,2),(2,3),(3,4)])
nx.closeness_centrality(pd)
{0: 0.4, 1: 0.375, 2: 0.3333333333333333, 3: 0.25, 4: 0.0}

To get what I expect i should .reverse() the graph.

Why?

Joel · Accepted Answer

If you check the documentation it says:

Closeness centrality of a node u is the reciprocal of the sum of the shortest path distances from u to all n−1 other nodes. Since the sum of distances depends on the number of nodes in the graph, closeness is normalized by the sum of minimum possible distances n−1. [emphasis mine]

Your node 4 has no paths from it to other nodes, just from other nodes to it.

A reasonable definition that takes into account paths of both directions would be to take the reciprocal of the results of closeness_centrality and the reciprocal of the results of closeness_centrality on the .reverse() graph, add them and then take the reciprocal (being careful with 0 values). An issue that you would see is that if there is a pair u and v which don't have a directed path between them, both would have a final centrality of 0.

Another option would involve defining the centrality so that for every pair of nodes u and v, you take the distance of the shortest path in either direction. I think this would involve rewriting the algorithm. You can find the current algorithm through the documentation, where there is a link to source (it's here). It's not particularly complicated, so I think it's doable.

Why should I reverse the arc of a Networkx DiGraph before calculating closeness centrality?

Answers (1)

Related Questions