Reputation: 572
I am quite confusing about the calculation of network homophily in network analysis. Right now I am compute the homophily by the following function, which have been written and also described by the following URL: http://dappls.umasscreate.net/networks/calculating-network-homophily-part-1/. The goal of this technique is to measure the homophily in a network by their proportion of all edges in a network. My goal is to measure the homophily in a directed network.
homophily <- function(graph,vertex.attr,attr.val=NULL,prop=T){
#Assign names as vertex attributes for edgelist output#
V(graph)$name<-vertex_attr(graph,vertex.attr)
#Get the basic edgelist#
ee<-get.data.frame(graph)
#If not specifying on particular attribute value, get percentage (prop=T)#
#or count (prop=F) of all nodes tied with matching attribute#
if(is.null(attr.val)){
ifelse(prop==T,sum(ee[,1]==ee[,2])/nrow(ee),sum(ee[,1]==ee[,2]))
#If not null, get proportion (prop=T) or count (prop=F) of#
#edges among nodes with that particular node attribute value#
} else {
ifelse(prop==T,sum(ee[,1]==attr.val & ee[,2]==attr.val)/nrow(ee[ee[,1]==attr.val|ee[,2]==attr.val,]),
sum(ee[,1]==attr.val & ee[,2]==attr.val))
}
}
set.seed(5165)
#Random directed graph with 100 nodes and 30% chance of a tie#
gg<-random.graph.game(100,0.3,"gnp",directed=T)
#Randomly assign the node attribute (group numbers 0:3)#
V(gg)$group<-sample(1:5,100,replace=T)
homophily(graph = abc, vertex.attr = "group")
[1] 0.1971504
However I also noticed that the igraph package contains as well a homophily method called "assortativity()" described here. Executing this function receives completely other results which is based on the assortativity coefficient in a range(-1, 1). The assortativity coefficient is positive is similar vertices (based on some external property) tend to connect to each, and negative otherwise.
library(igraph)
assortativity(abc, V(abc)$group, directed=T)
[1] -0.02653782
So right now I am quite confused, which of these methods is the right one to measure the homophily in a network, because both functions received different results. I also noticed that the igraph method does not support the calculation of particular groups. In my opinion I would rather go with the first one which is self-coded (not sure if there are some mistakes), because the interpretation makes more sense. So my question is, which of the following methods is the right one for measuring the homophily in a network?
Upvotes: 2
Views: 3765
Reputation: 667
Can you clarify "but is it not the same somehow the "same" ?
I couldn't access the link initially and misspoke. The homophily()
function above isn't exactly the same and requires a different interpretation.
Original setup and data:
library(igraph)
set.seed(5165)
gg <- random.graph.game(100, 0.3, "gnp", directed = TRUE)
V(gg)$group <- sample(1:5, 100, replace = TRUE)
Original function:
homophily <- function(graph,vertex.attr,attr.val=NULL,prop=T){
V(graph)$name<-vertex_attr(graph,vertex.attr)
ee<-get.data.frame(graph)
if(is.null(attr.val)){
ifelse(prop==T,sum(ee[,1]==ee[,2])/nrow(ee),sum(ee[,1]==ee[,2]))
} else {
ifelse(prop==T,sum(ee[,1]==attr.val & ee[,2]==attr.val)/nrow(ee[ee[,1]==attr.val|ee[,2]==attr.val,]),
sum(ee[,1]==attr.val & ee[,2]==attr.val))
}
}
Original results:
homophily(gg, "group")
#> [1] 0.2017368
New function that returns all the relevant details:
new_homophily <- function(graph, vertex.attr) {
V(graph)$name <- vertex_attr(graph, vertex.attr)
edges <- get.data.frame(graph)
# heterophilous ties where vertices have different `"group"` attributes
external <- length(which(edges$from != edges$to))
# homophilous ties where vertices have the same `"group"` attributes
internal <- length(which(edges$from == edges$to))
list(
n_external = external,
n_internal = internal,
prop_external = external / nrow(edges), # proportion of ties that are heterophilous
prop_internal = internal / nrow(edges), # proportion of ties that are homophilous (the results of your initial function)
ei_index = (external - internal) / nrow(edges) # (EL - IL) / (EL + IL)
)
}
New results:
new_homophily(gg, "group")
#> $n_external
#> [1] 2390
#>
#> $n_internal
#> [1] 604
#>
#> $prop_external
#> [1] 0.7982632
#>
#> $prop_internal
#> [1] 0.2017368 # the results of your initial function ===================
#>
#> $ei_index
#> [1] 0.5965264
Interpreting $ei_index
should be more straightforward than $prop_internal
. Values closer to +1 indicate more heterophily while values closer to -1 indicate more homophily.
If that fits your goal, here are some alternative E/I index options:
Michal Bojanowski's routine. It's not on CRAN, but it's available at https://github.com/mbojan/isnar
isnar::ei(gg, "group")
#> [1] 0.5965264
Full disclosure: This is my own routine. The package is decidedly unfinished and it's definitely not on CRAN. https://knapply.github.io/homophily/reference/ei_index.html
homophily::ei_index(gg, node_attr_name = "group")
#> [1] 0.5965264
Upvotes: 3