stats_noob
stats_noob

Reputation: 5925

R: understanding different graph functions

I am trying to understand some of the graph functions available in R (igraph). Here, I create some data and a graph:

library(igraph)
file <-data.frame(

"source" = c(
    "John",
    "John",
    "Tim",
    "Tim",
    "Alex",
    "Andrew",
    "Andrew",
    "Andrew",
    "Oliver",
    "Oliver",
    "Oliver",
    "Matt",
    "Steven",
    "Steven",
    "Steven",
    "Matt",
    "Charles",
    "Charles",
    "Charles",
    "Sean",
    "Ted",
    "Ryan",
    "Ryan",
    "Ryan",
    "Ted",
    "Phil",
    "Phil",
    "Phil",
    "Sam",
    "Toby",
    "Toby",
    "Donald",
    "Donald",
    "Donald",
    "Mitch",
    "Mitch",
    "Mitch"),

"target" = c("Sam",
             "Tim",
             "Alex",
             "Matt",
             "Andrew",
             "Sean",
             "Peter",
             "Ben",
             "Kevin",
             "Thomas",
             "Dave",
             "Steven",
             "Kenny",
             "Derek",
             "CJ",
             "Charles",
             "Ivan",
             "Kyle",
             "Andrew",
             "Ted",
             "Ryan",
             "Daniel",
             "Chris",
             "Scott",
             "Phil",
             "Henry",
             "George",
             "Paul",
             "Toby",
             "Donald",
             "Mitch",
             "Jack",
             "Luke",
             "Myles",
             "Elliot",
             "Harvey",
             "Owen")

)

graph <- graph.data.frame(file, directed=F)
graph <- simplify(graph)
plot(graph)

Some of my questions:

  1. I can run the Louvain Clustering algorithm :

https://igraph.org/r/doc/cluster_louvain.html

louvain = cluster_louvain(graph)
plot(louvain,graph)

However, there does not appear to be a method to change the "resolution" for the Louvain Clustering algorithm. I did some research and it does not seem to be possible. Am I correct?

  1. Clustering Coefficient

I tried reading the function descriptions and did not understand what the difference in the following functions, could someone please explain this to me?

assortativity.degree(graph)
[1] -0.666401
 transitivity(graph)
[1] 0
  1. "compare" function : compare communities from graphs with different number of vertices

I am trying to understand the "compare" function. The way I see it, it is supposed to compare the results of two different clustering algorithms on the same graph. E.g. Suppose I run "Louvain Clustering" and "fast greedy" on the same graph:

 a = cluster_louvain(graph)
 b = fastgreedy.community(graph)

Now I use the compare functions:

#part 1     
compare(a,b, method="rand")
    [1] 0.940256

#part2
     compare(membership(a), membership(b))
    [1] 0.460781

It would appear that in part 1: the "compare" function is comparing the overall clustering algorithms on both graphs. In part 2: the "compare" function is comparing the individual community structures. Am I correct? I thought the "compare" function would compare individual observations?

Thanks

Upvotes: 0

Views: 213

Answers (1)

Vincent Labatut
Vincent Labatut

Reputation: 1808

  1. The implementation of Louvain provided by igraph does not seem to give any control over the resolution, I think you're right. Some other implementations do though, e.g. in Gephi. Louvain outputs a dendrogram though, so you can always select the most appropriate cut.

  2. Clustering coefficient and degree assortativity are two completely different measures. The former is related to the ratio of closed to open triads in the graph (see WP), whereas the latter is akin to correlation computed for two numerical series constituted by the degrees of directly connected vertices (see WP), so it is concerned with dyads, not triads.

  3. If you have a look at igraph's documentation, you'll see that this function can take a communities object (your first case) but also a membership vector (second case). They are treated similarly by the function. I suppose that, in your example, the outputs differ because in the first case you indicate that you want to compute the Rand index, whereas in the second case you let the function use the default measure (probably not the Rand index).

Upvotes: 2

Related Questions