Marc
Marc

Reputation: 661

R igraph: algoritm to find the hierachy between certain edges in a DAG

I have a directed graph (DAG) representing a river network. On some of the rivers (edges) there are flow gauging stations. I would like to rank these stations according to their hierarchy from the most upstream river segments. The most upstream stations will have a class 1. Stations with only 1 station rank upstream will have a class 2, stations with 2 station ranks upstream will have a class 3, and so on. Is there an algorithm in igraph to do that? I searched in the doc for terms like "rank", "hierarchy", "order" but didn't find anything resembling to what I would like to perform.

I also used "distances" from the most downstream edge (the outlet of the river network) for the classification but it does not account for the relations among the stations (edges with very different distances can have the same rank depending on the river network configuration)...

Any suggestion on a graph algorithm to do that?

Here is an illustration of the classification:

enter image description here

Here is the data I use for testing (NOT related to the picture):

library(igraph)

vertices_df <- data.frame(
  id = c(418,410,402,394,386,427,378,370,362,354,346,338,330,322,314,306,298,290,282,274,595,266,419,395,258,250,242,234,226,218,210,202,194,186,178,170,162,146,138,130,122,114,106,98,90,82,74,66,58,50,42,34,26,18,10,2,587,579,571,563,555,547,539,531,523,515,28,12,154,507,499,491,483,475,467,459,451,443,435,411,403,387,379,371,363,355,347,339,331,323,315,307,299,291,283,275,267,259,251,243,235,227,219,211,203,195,187,179,171,163,155,147,139,131,123,115,107,99,91,83,75,67,59,51,43,35,27,19,11,3,20,4),
  station = c(1,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0)
)

edges_df <- data.frame(
  from = c(410,402,394,378,427,403,370,362,354,346,338,330,322,314,306,298,290,282,274,595,587,395,107,3,250,218,226,26,58,66,146,34,18,98,106,74,130,507,571,563,443,547,539,531,523,515,28,20,499,491,483,475,467,459,451,435,411,355,387,379,347,371,331,307,195,291,115,147,171,43,99,123,227,259,27,2,234,10,386,419,194,202,186,122,170,162,258,178,266,242,114,210,154,11,75,90,42,82,50,138,579,67,323,555,179,211,243,219,267,12,4,35,315,363,19,299,51,187,91,155,275,163,339,203,283,131,139,59,83,235,251),
  to = c(418,410,402,394,386,427,378,370,362,354,346,338,330,322,314,306,298,290,282,274,595,266,419,395,258,250,242,234,226,218,210,202,194,186,178,170,162,587,579,571,563,555,547,539,531,523,515,28,507,499,491,483,475,467,459,451,443,435,411,403,387,379,371,363,355,347,339,331,323,315,307,299,291,283,275,418,410,402,394,386,370,362,354,346,338,330,322,314,306,298,282,274,595,419,395,250,234,226,218,210,587,579,571,563,555,547,539,531,523,515,28,507,491,483,467,459,451,443,435,411,403,387,379,355,347,331,323,307,299,291,283)
)

net <- graph.data.frame(
  edges_df[, c("from", "to")],
  directed=TRUE,
  vertices=vertices_df
)

So I know edges 418, 394, 386, 499, 355, 331, 35 have a stations and I would like to rank them with an attribute 1, 2, 3, 4...

After having tried the "distances" and "topo_sort" algo in igraph, my last attempt was to use dfs:

res <- dfs(
  net,
  root = "418",
  neimode = "in",
  unreachable = TRUE,
  order = TRUE,
  order.out = TRUE,
  father = TRUE,
  dist = TRUE
)

res$dist[names(res$dist) %in% as.character(vertices_df[vertices_df$station == 1, "id"])]

So I know the order of the edges from the most downstream edge but I lost the relation information among the upstream edges (edges with different distances can have the same rank). I'm not sure where it leads me so I was wondering if another graph algorithm would better fit the purpose...

I would use R preferably as other parts of the workflow is in that language...

Upvotes: 3

Views: 216

Answers (1)

ThomasIsCoding
ThomasIsCoding

Reputation: 101568

You can try the code below

# subset vertices of interest, i.e., `station == 1`
v <- as.character(with(vertices_df, id[station == 1]))

# find the sink node among `v`
snk <- v[which(colSums(distances(net, v, v, "out") == Inf) == 0)]

# find the paths to the sink node from all other nodes and assign ranks along the path
lst <- lapply(
  v[!v %in% snk],
  function(p) {
    s <- intersect(names(shortest_paths(net, p, snk)$vpath[[1]]), v)
    data.frame(id = s, vrank = seq_along(s))
  }
)

# aggreagte all info and take the max rank for each id
dfout <- aggregate(. ~ id, do.call(rbind, lst), max)

and you will see

> dfout
   id vrank
1 331     1
2  35     1
3 355     1
4 386     2
5 394     3
6 418     4
7 499     2

Upvotes: 2

Related Questions