Daniel
Daniel

Reputation: 572

Computation of Network homophily

I am quite confusing about the calculation of network homophily in network analysis. Right now I am compute the homophily by the following function, which have been written and also described by the following URL: http://dappls.umasscreate.net/networks/calculating-network-homophily-part-1/. The goal of this technique is to measure the homophily in a network by their proportion of all edges in a network. My goal is to measure the homophily in a directed network.

Function

homophily <- function(graph,vertex.attr,attr.val=NULL,prop=T){
  #Assign names as vertex attributes for edgelist output#
  V(graph)$name<-vertex_attr(graph,vertex.attr)

  #Get the basic edgelist#
  ee<-get.data.frame(graph)

  #If not specifying on particular attribute value, get percentage (prop=T)#
  #or count (prop=F) of all nodes tied with matching attribute#
  if(is.null(attr.val)){
    ifelse(prop==T,sum(ee[,1]==ee[,2])/nrow(ee),sum(ee[,1]==ee[,2]))

  #If not null, get proportion (prop=T) or count (prop=F) of#
  #edges among nodes with that particular node attribute value#
  } else {
    ifelse(prop==T,sum(ee[,1]==attr.val & ee[,2]==attr.val)/nrow(ee[ee[,1]==attr.val|ee[,2]==attr.val,]),
           sum(ee[,1]==attr.val & ee[,2]==attr.val))
  }
}

Sample Data

set.seed(5165)
#Random directed graph with 100 nodes and 30% chance of a tie#
gg<-random.graph.game(100,0.3,"gnp",directed=T)

#Randomly assign the node attribute (group numbers 0:3)#
V(gg)$group<-sample(1:5,100,replace=T)

Output

By applying the function on sample data I receive the following output, which means that 20% of the ties in the network are between actors in the same group. It is also possible to compute the homophily for a specific group in percentage.
homophily(graph = abc, vertex.attr = "group")
[1] 0.1971504
However I also noticed that the igraph package contains as well a homophily method called "assortativity()" described here. Executing this function receives completely other results which is based on the assortativity coefficient in a range(-1, 1). The assortativity coefficient is positive is similar vertices (based on some external property) tend to connect to each, and negative otherwise.
library(igraph)
assortativity(abc, V(abc)$group, directed=T)
[1] -0.02653782

Question

So right now I am quite confused, which of these methods is the right one to measure the homophily in a network, because both functions received different results. I also noticed that the igraph method does not support the calculation of particular groups. In my opinion I would rather go with the first one which is self-coded (not sure if there are some mistakes), because the interpretation makes more sense. So my question is, which of the following methods is the right one for measuring the homophily in a network?

Upvotes: 2

Views: 3765

Answers (1)

knapply
knapply

Reputation: 667

Can you clarify "but is it not the same somehow the "same" ?

I couldn't access the link initially and misspoke. The homophily() function above isn't exactly the same and requires a different interpretation.

Original setup and data:

library(igraph)
set.seed(5165)
gg <- random.graph.game(100, 0.3, "gnp", directed = TRUE)
V(gg)$group <- sample(1:5, 100, replace = TRUE)

Original function:

homophily <- function(graph,vertex.attr,attr.val=NULL,prop=T){
  V(graph)$name<-vertex_attr(graph,vertex.attr)
  ee<-get.data.frame(graph)
  if(is.null(attr.val)){
    ifelse(prop==T,sum(ee[,1]==ee[,2])/nrow(ee),sum(ee[,1]==ee[,2]))
  } else {
    ifelse(prop==T,sum(ee[,1]==attr.val & ee[,2]==attr.val)/nrow(ee[ee[,1]==attr.val|ee[,2]==attr.val,]),
           sum(ee[,1]==attr.val & ee[,2]==attr.val))
  }
}

Original results:

homophily(gg, "group")
#> [1] 0.2017368

New function that returns all the relevant details:

new_homophily <- function(graph, vertex.attr) {
  V(graph)$name <- vertex_attr(graph, vertex.attr)
  edges <- get.data.frame(graph)
  
  # heterophilous ties where vertices have different `"group"` attributes
  external <- length(which(edges$from != edges$to))
  
  # homophilous ties where vertices have the same `"group"` attributes
  internal <- length(which(edges$from == edges$to))
  
  list(
    n_external = external,
    n_internal = internal,
    prop_external = external / nrow(edges), # proportion of ties that are heterophilous
    prop_internal = internal / nrow(edges), # proportion of ties that are homophilous (the results of your initial function)
    ei_index = (external - internal) / nrow(edges) # (EL - IL) / (EL + IL)
  )
}

New results:

new_homophily(gg, "group")
#> $n_external
#> [1] 2390
#> 
#> $n_internal
#> [1] 604
#> 
#> $prop_external
#> [1] 0.7982632
#> 
#> $prop_internal
#> [1] 0.2017368        # the results of your initial function ===================
#> 
#> $ei_index
#> [1] 0.5965264

Interpreting $ei_index should be more straightforward than $prop_internal. Values closer to +1 indicate more heterophily while values closer to -1 indicate more homophily.

If that fits your goal, here are some alternative E/I index options:

Michal Bojanowski's routine. It's not on CRAN, but it's available at https://github.com/mbojan/isnar

isnar::ei(gg, "group") 
#> [1] 0.5965264

Full disclosure: This is my own routine. The package is decidedly unfinished and it's definitely not on CRAN. https://knapply.github.io/homophily/reference/ei_index.html

homophily::ei_index(gg, node_attr_name = "group")
#> [1] 0.5965264

Upvotes: 3

Related Questions