Ju Ko
Ju Ko

Reputation: 508

How does the R igraph package compute Closeness Centrality?

I have the following network:

g <- graph(c("Amy", "Ram",
          "Ram", "Li",
          "Li", "Amy",
          "Amy", "Li",
          "Kate", "Li"), directed=TRUE)

and would like to understand how Closeness centrality in this network can be computed. From my understanding of the documentation, Closeness is the reciprocal of the mean of all shortest paths from one vertex to every other vertex of the network. Intuitively, I would thus compute it like so:

Dist <- distances(g, mode="out")
Dist[Dist == Inf] <- NA
1/rowMeans(Dist, na.rm=T)

However, this is most likely incorrect, as the values from the built-in igraph function to compute closeness centrality show different results:

closeness(g, mode = "out")

I would like to understand how closeness is calculated and how I can get to the same results step by step without using the built-in function.

Upvotes: 8

Views: 3593

Answers (3)

Alvaro Figueira
Alvaro Figueira

Reputation: 11

The version 1.5.0 of the igraph has the computation swapped between the normalized and not normalized. Therefore, if you ask closeness(g, vids="f", normalized = TRUE) it will give you the un-normalized, whereas closeness(g, vids="f") will give you the normalized result.

Upvotes: 1

Dylan_Gomes
Dylan_Gomes

Reputation: 2232

It looks like igraph may have changed since the excellent answer by @G5W was posted. The igraph documentation for closeness has changed:

The closeness centrality of a vertex is defined as the inverse of the sum of distances to all the other vertices in the graph:

If there is no (directed) path between vertex v and i, then i is omitted from the calculation. If no other vertices are reachable from v, then its closeness is returned as NaN.

And so has the output. Using the original example:

g <- graph(c("Amy", "Ram",
          "Ram", "Li",
          "Li", "Amy",
          "Amy", "Li",
          "Kate", "Li"), directed=TRUE)

closeness(g, mode = "out")

      Amy       Ram        Li      Kate 
0.5000000 0.3333333 0.3333333 0.1666667 

The results differ from the above example and the correct answer isn't reproduced by the accepted answer (linked above and reproduced here) because the function igraph::closeness has presumably changed:

Dist <- distances(g, mode="out")
Dist[Dist == Inf] <- vcount(g)
1/rowSums(Dist)

      Amy       Ram        Li      Kate 
0.1666667 0.1428571 0.1428571 0.1666667 

Note: this still matches the output of @G5W, but no longer matches closeness(g, mode = "out"). Instead something more similar to the OP, except with rowSums rather than rowMeans gives a match to igraph::closeness:

Dist <- distances(g, mode="out")
Dist[Dist == Inf] <- NA
1/rowSums(Dist,na.rm=T)

      Amy       Ram        Li      Kate 
0.5000000 0.3333333 0.3333333 0.1666667 

Session Info:

> sessionInfo()
R version 4.2.2 (2022-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19044)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.utf8 
[2] LC_CTYPE=English_United States.utf8   
[3] LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.utf8    

attached base packages:
[1] stats     graphics  grDevices utils     datasets 
[6] methods   base     

other attached packages:
[1] qgraph_1.9.2  igraph_1.3.2  tidyr_1.2.0   dplyr_1.0.9  
[5] ggplot2_3.4.0

loaded via a namespace (and not attached):
 [1] splines_4.2.2       gtools_3.9.2.2     
 [3] Formula_1.2-4       assertthat_0.2.1   
 [5] stats4_4.2.2        latticeExtra_0.6-29
 [7] yaml_2.3.5          pbivnorm_0.6.0     
 [9] pillar_1.7.0        backports_1.4.1    
[11] lattice_0.20-45     glue_1.6.2         
[13] digest_0.6.29       RColorBrewer_1.1-3 
[15] checkmate_2.1.0     colorspace_2.0-3   
[17] sandwich_3.0-2      htmltools_0.5.2    
[19] Matrix_1.5-1        plyr_1.8.7         
[21] psych_2.2.5         pkgconfig_2.0.3    
[23] purrr_0.3.4         xtable_1.8-4       
[25] corpcor_1.6.10      mvtnorm_1.1-3      
[27] scales_1.2.0        glasso_1.11        
[29] jpeg_0.1-9          fdrtool_1.2.17     
[31] emmeans_1.8.2       tibble_3.1.7       
[33] htmlTable_2.4.0     generics_0.1.2     
[35] farver_2.1.0        ellipsis_0.3.2     
[37] TH.data_1.1-1       withr_2.5.0        
[39] pbapply_1.5-0       nnet_7.3-18        
[41] mnormt_2.1.0        cli_3.6.0          
[43] survival_3.4-0      magrittr_2.0.3     
[45] crayon_1.5.1        estimability_1.4.1 
[47] evaluate_0.15       fansi_1.0.3        
[49] nlme_3.1-160        MASS_7.3-58.1      
[51] foreign_0.8-83      tools_4.2.2        
[53] data.table_1.14.2   lifecycle_1.0.3    
[55] multcomp_1.4-20     stringr_1.4.0      
[57] munsell_0.5.0       cluster_2.1.4      
[59] compiler_4.2.2      rlang_1.0.6        
[61] grid_4.2.2          rstudioapi_0.13    
[63] htmlwidgets_1.5.4   lavaan_0.6-11      
[65] base64enc_0.1-3     labeling_0.4.2     
[67] rmarkdown_2.14      gtable_0.3.0       
[69] codetools_0.2-18    abind_1.4-5        
[71] DBI_1.1.2           reshape2_1.4.4     
[73] R6_2.5.1            gridExtra_2.3      
[75] zoo_1.8-10          knitr_1.39         
[77] performance_0.10.1  fastmap_1.1.0      
[79] utf8_1.2.2          Hmisc_4.7-0        
[81] insight_0.18.8      stringi_1.7.6      
[83] parallel_4.2.2      Rcpp_1.0.8.3       
[85] vctrs_0.5.1         rpart_4.1.19       
[87] png_0.1-7           tidyselect_1.1.2   
[89] xfun_0.31           coda_0.19-4        

Upvotes: 2

G5W
G5W

Reputation: 37641

There are several things going on here. You do have a mistake in your code, but the big problem is with the closeness function - either its implementation or its documentation. First, what are we supposed to compute? The igraph documentation for closeness says:

The closeness centrality of a vertex is defined by the inverse of the average length of the shortest paths to/from all the other vertices in the graph:

1/sum( d(v,i), i != v)

If there is no (directed) path between vertex v and i then the total number of vertices is used in the formula instead of the path length.

Let's compare that with what it says in the Wikipedia article on closeness_centrality.

Closeness was defined by Bavelas (1950) as the reciprocal of the farness, that is:
C(x) = 1 / ∑ d(y,x)

where d(y,x) is the distance between vertices x and y. When speaking of closeness centrality, people usually refer to its normalized form which represents the average length of the shortest paths instead of their sum. It is generally given by the previous formula multiplied by N − 1 , where N is the number of nodes in the graph. For large graphs this difference becomes inconsequential so the −1 is dropped resulting in:
C(x) = N / ∑ d(y,x)

This adjustment allows comparisons between nodes of graphs of different sizes.

First off, the igraph documentation takes the sum over i != v.
The words say "the inverse of the average length" which would imply C(x) = (N-1) / ∑ d(y,x) but the formula says 1 / ∑ d(y,x). In fact, we will see that what the closeness function computes corresponds to this original definition despite the words indicating the normalized version.

But there is one other problem. You changed the Inf values to NA and then used na.rm=T. Notice the last sentence in the igraph documentation.

If there is no (directed) path between vertex v and i then the total number of vertices is used in the formula instead of the path length.

You are not supposed to ignore these nodes. You are supposed to set the distance to the total number of nodes in the graph. So, to get the same result as produced by igraph, you need to compute:

Dist <- distances(g, mode="out")
Dist[Dist == Inf] <- vcount(g)
1/rowSums(Dist)
      Amy       Ram        Li      Kate 
0.1666667 0.1428571 0.1428571 0.1666667 
closeness(g, mode = "out")
      Amy       Ram        Li      Kate 
0.1666667 0.1428571 0.1428571 0.1666667 

Certainly, the igraph documentation is inconsistent. The words say that it computes the normalized closeness, but the formula (and what it actually computes) is the un-normalized form.

I hope that this makes it clear what is being computed and helps you pick what you want to use for your analysis.

BTW: When you compute 1/rowMeans(Dist), you are including the v=i case (where the distance is zero) which igraph leaves out. That means that you are computing C(x) = N / ∑ d(y,x) rather than C(x) = (N-1) / ∑ d(y,x). As noted in Wikipedia, for large graphs, they are essentially the same, but I just want to be sure that you are aware of what you are computing.

Upvotes: 6

Related Questions