Anthony W
Anthony W

Reputation: 1327

Adding nodes to a disconnected graph in order to fully connect the graph components, with inter-node distance constraints

I have a graph where each node has a spatial position given by (x,y), and the edges between the nodes are only connected if the euclidean distance between each node is sqrt(2) or less. Here's my example:

import networkx

G=nx.Graph()

G.add_node(1,pos=(1,1))
G.add_node(2,pos=(2,2))
G.add_node(3,pos=(1,2))

G.add_node(4,pos=(1,4))
G.add_node(5,pos=(2,5))

G.add_node(6,pos=(4,2))
G.add_node(7,pos=(5,2))
G.add_node(8,pos=(5,3))

# Connect component one
G.add_edge(1,2)
G.add_edge(1,3)
G.add_edge(2,3)

# Connect component two
G.add_edge(6,7)

# Connect component three
G.add_edge(6,8)
G.add_edge(7,8)
G.add_edge(4,5)

pos=nx.get_node_attributes(G,'pos')

nx.draw(G,pos)

enter image description here

My question is, how can I determine the optimal position and number of additional nodes such that the graph components are connected, whilst ensuring that any additional node is always within sqrt(2) of an existing node?

Upvotes: 7

Views: 1253

Answers (3)

daniel
daniel

Reputation: 71

I am quite convinced that this problem is NP-hard. The closest problem I know is the geometric Steiner tree problem with octilinear metric. I have two, rather quick-and-dirty, suggestions. Both are heuristic.

1st idea: Formulate the problem as an Euclidean Steiner tree problem (https://en.wikipedia.org/wiki/Steiner_tree_problem#Euclidean_Steiner_tree), where you consider just the nodes of your problem and forget about the edges at first. Solve the problem by using GeoSteiner: http://www.geosteiner.com/ This should quickly give you a solution for problems with 10000 or more nodes (if you need to solve bigger problems, you can write the problem out with GeoSteiner after the full-Steiner tree generation and use https://scipjack.zib.de/). There is no Python interface, but just write your problem to a plain text file, the syntax is quite easy. Afterward, put additional nodes into the solution provided by GeoSteiner such that the \sqrt(2) condition is satisfied. Finally, you need to do some clean-up to get rid of redundant edges, because the solution will not take into account that you already have edges in your original problem. Take all the edges and nodes that you have computed so far and define a weighted graph by giving all of your original edges weight 0 and all of the newly added edges weight 1. Consider a Steiner tree problem in graphs (https://en.wikipedia.org/wiki/Steiner_tree_problem#Steiner_tree_in_graphs_and_variants) on this weighted graph, where the terminal set corresponds to your original nodes. Solve this Steiner tree problem with SCIP-Jack: https://scipjack.zib.de/.

2nd idea: Consider your problem directly as a Steiner tree problem in graphs as follows: Each of the original edges is assigned weight 0, consider all original nodes as terminals. Add additional nodes and edges at distance at most \sqrt(2) from each other. For example, you could put a big rectangle around all your connected components and from each node add recursively additional 8 nodes in an angle at degrees 0,45,90,... at a distance of sqrt(2) and with edge of weight 1 in the Steiner tree problem in graphs, as long as they are within the rectangle. If one of these nodes is within distance sqrt(2) of one of your original nodes, connect them directly with an edge of weight 1. Solve the corresponding Steiner tree problem in graphs with SCIP-Jack.

Upvotes: 1

VirtualScooter
VirtualScooter

Reputation: 1888

Since the maximum length of a vertex can be sqrt(2), we can assume that our graph needs to live in grid world, where nodes have positive integer x and y coordinates, and vertices can only connect nearby nodes, subject to the length restriction.

The solution would be to extend components one at a time with nodes and vertices that bring the components closer together. The connected graph for the question would add nodes 9 and 10, with connecting vertices. See first picture below.

enter image description here

Besides the test mentioned in the OP's question, it seems that the more interesting test would be with node 4 in position (3, 4). The optimal solution in this case in one Steiner point (see enter link description here).

To build such a solution for small graphs, we can use an algorithm that adds one node and edge connecting one node of the pair of nodes that is closest to each other. To accommodate building a Steiner tree, we allow for choosing the location of point in a diagonal location. That is, node 2 and 6 have the same y coordinate, but we'll create node 9 diagonally, so that we can build a bridge to the next closest neighbor. See picture below.

Connected graph on modified problem

To prevent building extra nodes, nodes should be added in turns. Since we have numbered the nodes, this can be done by selecting the smallest node number to grow from. This leads to growing a bigger Steiner tree when more space needs to be crossed. See picture below, where nodes 9 through 12 where added to the original graph.

enter image description here

The rough algorithm for this (and wildly inefficient) is as follows. The Euclidean distance function is modified from Russell & Norvig (Github library with Artificial Intelligence, A Modern Approach by Stuart Russell and Peter Norvig).

import itertools
import networkx as nx
import numpy as np

# Assumes graph G created same or similar to code in OP's question

def closest_nodes(c1, c2, components):
    nodes_c1 = [(n, G.nodes[n]['pos']) for n in components[c1]]
    nodes_c2 = [(n,G.nodes[n]['pos']) for n in components[c2]]
    xnodes = itertools.product(nodes_c1, nodes_c2)
    closest = min(xnodes, key=lambda x: euclidean_distance(x[0], x[1]))
    return euclidean_distance(closest[0], closest[1]), closest

def next_closest_node(n1, c1, c2, components):
    other_nodes = []
    for i,v in enumerate(components):
        if i == c1 or i == c2:
            continue
        other_nodes.extend((n, G.nodes[n]['pos']) for n in v)
    if len(other_nodes) == 0:
        return None
    # print('next closest', n1, other_nodes)
    closest = min(other_nodes, key=lambda x: euclidean_distance(x, n1))
    return closest

def closest_comps(components):
    xcomps = itertools.combinations(range(len(components)), 2)
    closest = min(xcomps,
                  key=lambda x: closest_nodes(x[0], x[1], components)[0])
    return closest

def euclidean_distance(x, y):
    xc = x[1]
    yc = y[1]
    # print('e_d', x, y)
    return np.sqrt(sum((_x - _y) ** 2 for _x, _y in zip(xc, yc)))

def determine_pos(sel_comps, sel_nodes, components):
    next_closest = next_closest_node(sel_nodes[1][0], *sel_comps, components)
    # Grow from the smallest node number
    if sel_nodes[1][0][0] < sel_nodes[1][1][0]:
        from_node = sel_nodes[1][0][0]
        start_pos = sel_nodes[1][0][1]
        target_pos = sel_nodes[1][1][1]
    else:
        from_node = sel_nodes[1][1][0]
        start_pos = sel_nodes[1][1][1]
        target_pos = sel_nodes[1][0][1]
    diff_x = target_pos[0]-start_pos[0]
    diff_y = target_pos[1]-start_pos[1]
    new_x, new_y = start_pos[0], start_pos[1]
    if diff_x != 0:
        new_x += diff_x//abs(diff_x)
    elif next_closest is not None:
        diff_x = next_closest[1][0]-start_pos[0]
        if diff_x != 0:
            new_x += diff_x//abs(diff_x)
    if diff_y != 0:
        new_y += diff_y//abs(diff_y)
    elif next_closest is not None:
        diff_y = next_closest[1][1]-start_pos[1]
        if diff_y != 0:
            new_y += diff_y//abs(diff_y)
    return from_node, (new_x, new_y)

while not nx.is_connected(G):
    components = [c for c in
                  sorted(nx.connected_components(G), key=len, reverse=True)]
    sel_comps = closest_comps(components)
    sel_nodes = closest_nodes(*sel_comps, components)
    sel_dist = euclidean_distance(sel_nodes[1][0], sel_nodes[1][1])
    edge_needed = sel_dist <= np.sqrt(2)
    if edge_needed:
        G.add_edge(sel_nodes[1][0][0], sel_nodes[1][1][0])
    else:
        new_pos = determine_pos(sel_comps, sel_nodes, components)
        new_node = G.number_of_nodes() + 1
        G.add_node(new_node, pos=new_pos[1])
        G.add_edge(new_pos[0], new_node)
    pos = nx.get_node_attributes(G, 'pos')
    nx.draw(G, pos, with_labels=True)

Upvotes: 2

Anthony W
Anthony W

Reputation: 1327

I tried applying a Genetic Algorithm to the problem above. I made an initial guess that two additional nodes would connect all three disconnected components.

import networkx as nx
import pygad
import math
from libpysal import weights
import numpy as np

no_comps_target = 1                    # a connected graph has 1 component
max_dist_between_nodes = math.sqrt(2)  # in km
num_pts = 2                            # number of additional nodes to add
num_genes = num_pts*2                  # number of genes required with the GA. A gene each for x and y coordinates.

# Generate coordinate np array of existing components
centroids = np.array([(v) for k, v in pos.items()])

# Create indcies for new nodes within the GA solution list
y_ix = [x for x in range(num_genes) if x % 2 != 0] 
x_ix = [x for x in range(num_genes) if x % 2 == 0] 

# Define fitness function
def my_fitness_func(solution, solution_idx):
    
    # Select coordinates of GA solution
    xs = np.array(solution[x_ix])
    ys = np.array(solution[y_ix])
    new_pts = np.column_stack((xs,ys))
   
    # Create one set for all coordinates
    all_pts = np.append(centroids, new_pts,axis=0)
        
    # Calculate weights using a distance band equal to the max distance between nodes
    w = weights.DistanceBand.from_array(all_pts, 
                                        threshold=max_dist_between_nodes, 
                                        silence_warnings=True)

    # Convert to a networkx obejct
    G = w.to_networkx()
    
    # Calculate the number of graph components for G
    # Target is 1 - fully connected graph
    no_comps_solution = nx.number_connected_components(G)
         
    
    # Calculate solution fitness
    fitness = 1.0 / np.abs(no_comps_target - no_comps_solution + 0.000001)
        
        return fitness
# Set constraints on possible solution locations
x_max = 5
x_min = 0
y_max = 5
y_min = 1

ga_instance = pygad.GA(
    num_generations=20,
    num_parents_mating=2,
    fitness_func=my_fitness_func,
    sol_per_pop=10,
    num_genes=num_genes,
    gene_type= int,
    gene_space= [{'low': x_min, 'high': x_max, 'step': 1}, 
                 {'low': y_min, 'high': y_max, 'step': 1}] * num_pts,
    mutation_num_genes=1,
    parent_selection_type = "sss",
    keep_parents =2,
    stop_criteria="saturate_10" # Stop if no progress after 10 generations
)

ga_instance.run()

# If final reached a maximum we should expect a fitness of 100,000.
solution, solution_fitness, solution_idx = ga_instance.best_solution()
print("Parameters of the best solution : {solution}".format(solution=solution))
print("Fitness value of the best solution = {solution_fitness}".format(solution_fitness=solution_fitness))

This gives a valid solution:

Parameters of the best solution : [3 3 2 4]
Fitness value of the best solution = 1000000.0

And from running it multiple times, I get multiple valid solutions. Which I think makes sense. Also, how to determine the optimal number of additional nodes? Especially if this problem were to be much larger. I'd still like to know if there are other ways of solving this problem. Especially if they come with less code!

Upvotes: 1

Related Questions