Ankit
Ankit

Reputation: 6664

r igraph find all cycles

I have directed igraph and want to fetch all the cycles. girth function works but only returns the smallest cycle. Is there a way in R to fetch all the cycles in a graph of length greater then 3 (no vertex pointing to itself and loops)

Upvotes: 12

Views: 5674

Answers (1)

G5W
G5W

Reputation: 37661

It is not directly a function in igraph, but of course you can code it up. To find a cycle, you start at some node, go to some neighboring node and then find a simple path back to the original node. Since you did not provide any sample data, I will illustrate with a simple example.

Sample data

## Sample graph
library(igraph)
set.seed(1234)
g = erdos.renyi.game(7, 0.29, directed=TRUE)
plot(g, edge.arrow.size=0.5)

Sample Graph

Finding Cycles

Let me start with just one node and one neighbor. Node 2 connects to Node 4. So some cycles may look like 2 -> 4 -> (Nodes other than 2 or 4) -> 2. Let's get all of the paths like that.

v1 = 2
v2 = 4
lapply(all_simple_paths(g, v2,v1, mode="out"), function(p) c(v1,p))
[[1]]
[1] 2 4 2
[[2]]
[1] 2 4 3 5 7 6 2
[[3]]
[1] 2 4 7 6 2

We see that there are three cycles starting at 2 with 4 as the second node. (I know that you said length greater than 3. I will come back to that.)

Now we just need to do that for every node v1 and every neighbor v2 of v1.

Cycles = NULL
for(v1 in V(g)) {
    for(v2 in neighbors(g, v1, mode="out")) {
        Cycles = c(Cycles, 
            lapply(all_simple_paths(g, v2,v1, mode="out"), function(p) c(v1,p)))
    }
}

This gives 17 cycles in the whole graph. There are two issues though that you may need to look at depending on how you want to use this. First, you said that you wanted cycles of length greater than 3, so I assume that you do not want the cycles that look like 2 -> 4 -> 2. These are easy to get rid of.

LongCycles = Cycles[which(sapply(Cycles, length) > 3)]

LongCycles has 13 cycles having eliminated the 4 short cycles

2 -> 4 -> 2
4 -> 2 -> 4
6 -> 7 -> 6
7 -> 6 -> 7

But that list points out the other problem. There still are some that you cycles that you might think of as duplicates. For example:

2 -> 7 -> 6 -> 2
7 -> 6 -> 2 -> 7
6 -> 2 -> 7 -> 6

You might want to weed these out. To get just one copy of each cycle, you can always choose the vertex sequence that starts with the smallest vertex number. Thus,

LongCycles[sapply(LongCycles, min) == sapply(LongCycles, `[`, 1)]
[[1]]
[1] 2 4 3 5 7 6 2
[[2]]
[1] 2 4 7 6 2
[[3]]
[1] 2 7 6 2

This gives just the distinct cycles.


Addition regarding efficiency and scalability

I am providing a much more efficient version of the code that I originally provided. However, it is primarily for the purpose of arguing that, except for very simple graphs, you will not be able produce all cycles.

Here is some more efficient code. It eliminates checking many cases that either cannot produce a cycle or will be eliminated as a redundant cycle. In order to make it easy to run the tests that I want, I made it into a function.

## More efficient version
FindCycles = function(g) {
    Cycles = NULL
    for(v1 in V(g)) {
        if(degree(g, v1, mode="in") == 0) { next }
        GoodNeighbors = neighbors(g, v1, mode="out")
        GoodNeighbors = GoodNeighbors[GoodNeighbors > v1]
        for(v2 in GoodNeighbors) {
            TempCyc = lapply(all_simple_paths(g, v2,v1, mode="out"), function(p) c(v1,p))
            TempCyc = TempCyc[which(sapply(TempCyc, length) > 3)]
          TempCyc = TempCyc[sapply(TempCyc, min) == sapply(TempCyc, `[`, 1)]
          Cycles  = c(Cycles, TempCyc)
        }
    }
    Cycles
}

However, except for very simple graphs, there is a combinatorial explosion of possible paths and so finding all possible cycles is completely impractical I will illustrate this with graphs much smaller than the one that you mention in the comments.

First, I will start with some small graphs where the number of edges is approximately twice the number of vertices. Code to generate my examples is below but I want to focus on the number of cycles, so I will just start with the results.

## ecount ~ 2 * vcount
Nodes   Edges   Cycles
10   21    15
20   41    18
30   65    34
40   87   424
50  108  3433
55  117 22956

But you report that your data has approximately 5 times as many edges as vertices. Let's look at some examples like that.

## ecount ~ 5 * vcount
Nodes  Edges    Cycles
10      48        3511
12      61       10513
14      71      145745

With this as the growth of the number of cycles, using 10K nodes with 50K edges seems to be out of the question. BTW, it took several minutes to compute the example with 14 vertices and 71 edges.

For reproducibility, here is how I generated the above data.

set.seed(1234)
g10 = erdos.renyi.game(10, 0.2, directed=TRUE)
ecount(g10)
length(FindCycles(g10))

set.seed(1234)
g20 = erdos.renyi.game(20, 0.095 , directed=TRUE)
ecount(g20)
length(FindCycles(g20))

set.seed(1234)
g30 = erdos.renyi.game(30, 0.056 , directed=TRUE)
ecount(g30)
length(FindCycles(g30))

set.seed(1234)
g40 = erdos.renyi.game(40, 0.042 , directed=TRUE)
ecount(g40)
length(FindCycles(g40))

set.seed(1234)
g50 = erdos.renyi.game(50, 0.038 , directed=TRUE)
ecount(g50)
length(FindCycles(g50))

set.seed(1234)
g55 = erdos.renyi.game(55, 0.035 , directed=TRUE)
ecount(g55)
length(FindCycles(g55))

##########
set.seed(1234)
h10 = erdos.renyi.game(10, 0.55, directed=TRUE)
ecount(h10)
length(FindCycles(h10))

set.seed(1234)
h12 = erdos.renyi.game(12, 0.46, directed=TRUE)
ecount(h12)
length(FindCycles(h12))

set.seed(1234)
h14 = erdos.renyi.game(14, 0.39, directed=TRUE)
ecount(h14)
length(FindCycles(h14))

Upvotes: 19

Related Questions