Reputation: 17676
I am curious how to compute some metrics for each node.
For each node compute percentage of fraudulent connections for
direct node (directed)
direct node (undirected)
the friendship network from the node (directed)
the friendship network from the node (undirected)
in total and per relationship type.
Getting started with igraph I am not sure how to move forward to writing own graph processing functions (i.e. not only applying degree, pagerank, ...). Looking forward to some suggestions to solve this task with only one pass over the graph.
Minimal sample is here
library(igraph)
id = c("a", "b", "c", "d", "e", "f", "g")
name = c("Alice", "Bob", "Charlie", "David", "Esther", "Fanny", "Gaby")
fraud = c(TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE)
verticeData <- data.frame(id, name, fraud)
verticeData
src <- c("a", "b", "c", "f", "e", "e", "d", "a")
dst <- c("b", "c", "b", "c", "f", "d", "a", "e")
relationship <-c("A", "B", "B", "B", "B", "A", "A", "A")
edgeData <- data.frame(src, dst, relationship)
edgeData
g <- graph_from_data_frame(edgeData, directed = TRUE, vertices = verticeData)
plot(g, vertex.color=V(g)$fraud)
# TODO compute metrics
I do not have privileges to move, so will do manually based on comment from https://stats.stackexchange.com/questions/256859/igraph-compute-metrics-for-each-node-and-its-network
Upvotes: 2
Views: 702
Reputation: 3729
The gapply
function from the sna
package gives a lot of flexibility to calculate various ego network statistics. It functions more or less like the apply
family of functions, but specifically loops over network neighborhoods. The intergraph
package makes it easy to convert between igraph
and sna
.
library(sna)
net<-intergraph::asNetwork(g)
c <- c(1,2)
funcs <- c(sum,mean)
for (i in funcs){
for (j in list(1,2,c)){
print(gapply(net,j,net %v% "fraud",i))
}
}
gapply
in not super straight forward to use. The second argument ("MARGIN") indicates either row-wise (outgoing ties), column-wise (incoming ties), or both (i.e., undirected). The third argument is a vector of statistics to calculate, and the fourth argument is the function you want to use. As you can, there is a lot of flexibility in the third and fourth arguments.
> gapply(net,c(1,2),net %v% "fraud",sum)
[1] 0 1 0 1 1 0 0
> gapply(net,c(1),net %v% "fraud",sum)
Alice Bob Charlie David Esther Fanny Gaby
0 0 0 1 0 0 0
> gapply(net,c(2),net %v% "fraud",sum)
Alice Bob Charlie David Esther Fanny Gaby
0 1 0 0 1 0 0
Upvotes: 4