William
William

Reputation: 164

Reduce the complexity of the algorithm to construct a directed graph (DAG) from an undirected graph that satisfies the given constraints

I have a network of more than 4,000 nodes and I have a list of edges (connections between pairs of nodes). All nodes should converge to a single central point, but I have no way of ordering the nodes, since they are not numbered or labeled in a way that it is feasible to reorder them.

What I need?: Based on the small example attached, I need all the nodes point to node F (F is reachable from all nodes), so that the undirected graph becomes a directed graph (DAG) and that as a restriction there is only a single edge between each node pair. I am allowed to remove edges if and only if it is to remove loops (eg A -> B, B <- A). I can't add edges either, as this is a real network and I can't create connections where they don't exist.

What I have is this:

 library(igraph)
 library(tidygraph)
 library(ggraph)
 library(tidyverse)

 # edge list
 edgelist <- tribble(
  ~from, ~to,
  "A", "B",
  "A", "C",
  "B", "D",
  "C", "D",
  "C", "E",
  "D", "E",
  "D", "F")
 
 # create the graph
 g <- as_tbl_graph(edgelist)
 
 # undirected graph 
 g %>% 
  ggraph(layout = "graphopt") +
  geom_edge_link() +
  geom_node_point(shape = 21, size = 18, fill = 'white') +
  geom_node_text(aes(label = name), size = 3) +
  theme_graph() 

This is the procedure I came up with to do the sorting so that the list of edges would become a DAG:

 s <- names(V(g))
 
 # define node objective
 target <- "F"
 
 # exclude target from vertex list
 vertex_list <- s[s != target]
 
 # calculate the simple path of each node to the destination node (target)
 route_list <- map(vertex_list, ~ all_simple_paths(graph = g, 
                                                   from = .x,
                                                   to = target)) %>% 
  set_names(vertex_list) %>% 
  map(~ map(., ~ names(.x))) %>%
  flatten() %>% 
  map(~ str_c(.x, collapse = ","))
 
 
 # generate the list of ordered edges
 ordered_edges <- do.call(rbind, route_list) %>% 
  as.data.frame(row.names = F) %>%  
  set_names("chain") %>% 
  group_by(chain) %>% 
  summarise(destination = str_split(chain, ","), .groups = "drop") %>% 
  mutate(
   
   from = map(destination, ~ lag(.x)) %>% 
    map(~ .x[!is.na(.x)]),
   
   to = map(destination, ~ lead(.x)) %>% 
    map(~ .x[!is.na(.x)]),
  ) %>% 
  
  select(from, to) %>% 
  unnest(cols = everything()) %>% 
  group_by(across(everything())) %>% 
  summarise(enlaces = n(), .groups = "drop") %>% 
  select(-enlaces)

Warning: When the number of nodes is of a certain size (let's say 90), this algorithm generates loops that make the graph non-acyclic, so an additional procedure I do is apply a function in Python called feedback_arc_set to remove the edges that will make the graph a DAG.

For simplicity, I am not including the necessary code to remove these loops, since in this specific example no loops are generated.

 # draw the graph again
 as_tbl_graph(ordered_edges) %>% 
  ggraph(layout = "graphopt") +
  geom_edge_link(arrow = arrow(length = unit(3, 'mm'),
                               type = "closed", 
                               angle = 30),
                 end_cap = circle(7, 'mm')) +
  geom_node_point(shape = 21, size = 18, fill = 'white') +
  geom_node_text(aes(label = name), size = 3) +
  theme_graph() 

Created on 2021-07-07 by the reprex package (v2.0.0)

So what is the problem?: The complexity of the algorithm when the number of nodes is greater than 2000

If I try to do this with say 2000 nodes the algorithm never ends. I left it running for 24 hours and it didn't finish. In fact, I didn't find a way to know if it was working. In this place I found that the function of {igraph} all_simple_paths uses DFS behind the scenes, but complexity is O (|V|!) where |V| is the number of vertices and |V|! is the factorial of the number of vertices.

Is there a way to do this with less complexity?

Upvotes: 1

Views: 484

Answers (2)

ThomasIsCoding
ThomasIsCoding

Reputation: 102379

Quick Answer

Actually you can split the vertices into groups in terms of distances to "F", and then check the neighborhood between nodes from two adjacent groups to add the edges.


Idea Behind

With respect to the distances to "F", the idea comes from the facts that:

  1. If a node is with distance d, then its parent(s) must be with distance d+1.
  2. If X is with distance d+1, then the nodes with distance d must be the children of X if and only if they are neighbors of X.

My Attempt

D <- distances(g)
d <- distances(g, "F")
lst <- split(colnames(d), d)
lst <- lst[order(as.integer(names(lst)))]
res <- c()
for (k in head(seq_along(lst), -1)) {
    pre <- lst[[k]]
    nxt <- lst[[k + 1]]
    for (p in pre) {
        dp <- D[p, nxt, drop = FALSE]
        if (any(dp == 1)) {
            res[[length(res) + 1]] <- data.frame(
                from = colnames(dp)[dp == 1],
                to = p
            )
        }
    }
}
dag <- graph_from_data_frame(do.call(rbind, res))

then

plot(dag)

gives

enter image description here

Upvotes: 1

ravenspoint
ravenspoint

Reputation: 20586

There is no way of avoiding doing a DFS. However the problem is not due to the complexity of the DFS algorithm. I can do a DFS on a graph of 403,394 nodes and 3,387,388 links in less than a second https://github.com/JamesBremner/PathFinder2/wiki/Performance

The likely problem is that your algorithm calls for a huge number of DFS's to be performed.

I would suggest the following algorithm, which should run in a second or so with a graph of moderate size such as 4,000 nodes.

The first thing you need to do is check that every node is reachable from F. A single DFS starting from F will tell you this. If every node is not reachable, the problem is not solvable without adding edges.

Now, traverse the paths to determine which direction each link should have. Note that any edge that is not traversed is unnecessary and can be removed - thus preventing the "accidental" introduction of loops

Note that if you have a decent DFS implementation that allows you to specify a visitor, you can do this in one step, marking the direction of edges as the DFS goes along. All that remains is to remove unnecessary edges that have not been visited. The whole thing will then run in a flash on a 4,000 node graph.

===

Any interest in an application to give a fast solution to this problem? Running on a MSWindows machine, written in C++17, based on the PathFinder class, guaranteed performance > 1,000 nodes/second?

Upvotes: 1

Related Questions