Reputation: 168
I've been trying for weeks to figure this out: I need to recursively search a topological network, OpenStreetMap streets in this case, for dead ends, and neighborhoods that hang from the rest of the network by only one edge. These are places where you might expect to see a no-exit sign if your city is considerate like that.
My table has a record for each edge in the network. Each edge has a 'target' and 'source' field, identifying the node to which that side of the edge is connected. I've added a binary column called 'dangling' to indicate whether the edge has been identified as a dea-ending segment. I initialize this column as FALSE, assuming the best.
So far, I've been able to get to identify simply branching dead-ends with the following SQL
WITH node_counts AS ( -- get all unique nodes
SELECT target AS node FROM edge_table WHERE NOT dangling
UNION ALL
SELECT source AS node FROM edge_table WHERE NOT dangling),
single_nodes AS ( -- select only those that occur once
SELECT node
FROM node_counts
GROUP BY node
HAVING count(*) = 1
) --
UPDATE edge_table SET dangling = true
FROM single_nodes
WHERE node = target OR node = source;
I simply keep running this query until no rows are updated. The result looks like this(red is dangling = true):
https://i.sstatic.net/OE1rZ.png
Excellent! This is working great...but there are still cul-de-sac neighborhoods if you will, which are only connected to the larger network by one edge. How can I identify those?
My best guess is that I'm going to need a WITH RECURSIVE at some point, but that's about as far as my unmathmatical mind will go. Can anyone point me in the right direction?
Upvotes: 1
Views: 1332
Reputation: 168
OK. Here's how I figured it out:
I decided that there was not a way, or least not an easy way to implement this in SQL alone. I ended up implementing Tarjan's Algorithm in PHP and SQL, creating a temporary nodes table which linked each node to a strongly connected subcomponent of the graph. Once that was done, I updated any segment that was touching a node which did not belong to the largest subcomponent, as 'dangling'. All edges therefor that started and ended at nodes belonging to the largest subcomponent belong to the main street network (not dangling).
Here's the code. Note that it can take a very long time to run on a large graph. It's also pretty hard on the working memory, but it worked for my purposes.
<?php
$username = '';
$password = '';
$database = '';
$edge_table = 'cincy_segments';
$v1 = 'target';
$v2 = 'source';
$dangling_boolean_field = 'dangling';
$edge_id_field = 'edge_id';
//global variables declared
$index = 0;
$component_index = 0;
$nodes = array();
$stack = array();
pg_connect("host=localhost dbname=$database user=$username password=$password");
// get vertices
echo "getting data from database\n";
$neighbors_query = pg_query("
WITH nodes AS (
SELECT DISTINCT $v1 AS node FROM $edge_table
UNION
SELECT DISTINCT $v2 AS node FROM $edge_table
),
edges AS (
SELECT
node,
$edge_id_field AS edge
FROM nodes JOIN $edge_table
ON node = $v1 OR node = $v2
)
SELECT
node,
array_agg(CASE WHEN node = $v2 THEN $v1
WHEN node = $v1 THEN $v2
ELSE NULL
END) AS neighbor
FROM edges JOIN $edge_table ON
(node = $v2 AND edge = $edge_id_field) OR
(node = $v1 AND edge = $edge_id_field)
GROUP BY node");
// now make the results into php results
echo "putting the results in an array\n";
while($r = pg_fetch_object($neighbors_query)){ // for each node record
$nodes[$r->node]['id'] = $r->node;
$nodes[$r->node]['neighbors'] = explode(',',trim($r->neighbor,'{}'));
}
// create a temporary table to store results
pg_query("
DROP TABLE IF EXISTS temp_nodes;
CREATE TABLE temp_nodes (node integer, component integer);
");
// the big traversal
echo "traversing graph (this part takes a while)\n";
foreach($nodes as $id => $values){
if(!isset($values['index'])){
tarjan($id, 'no parent');
}
}
// identify dangling edges
echo "identifying dangling edges\n";
pg_query("
UPDATE $edge_table SET $dangling_boolean_field = FALSE;
WITH dcn AS ( -- DisConnected Nodes
-- get nodes that are NOT in the primary component
SELECT node FROM temp_nodes WHERE component != (
-- select the number of the largest component
SELECT component
FROM temp_nodes
GROUP BY component
ORDER BY count(*) DESC
LIMIT 1)
),
edges AS (
SELECT DISTINCT e.$edge_id_field AS disconnected_edge_id
FROM
dcn JOIN $edge_table AS e ON dcn.node = e.$v1 OR dcn.node = e.$v2
)
UPDATE $edge_table SET $dangling_boolean_field = TRUE
FROM edges WHERE $edge_id_field = disconnected_edge_id;
");
// clean up after ourselves
echo "cleaning up\n";
pg_query("DROP TABLE IF EXISTS temp_nodes;");
pg_query("VACUUM ANALYZE;");
// the recursive function definition
//
function tarjan($id, $parent)
{
global $nodes;
global $index;
global $component_index;
global $stack;
// mark and push
$nodes[$id]['index'] = $index;
$nodes[$id]['lowlink'] = $index;
$index++;
array_push($stack, $id);
// go through neighbors
foreach ($nodes[$id]['neighbors'] as $child_id) {
if ( !isset($nodes[$child_id]['index']) ) { // if neighbor not yet visited
// recurse
tarjan($child_id, $id);
// find lowpoint
$nodes[$id]['lowlink'] = min(
$nodes[$id]['lowlink'],
$nodes[$child_id]['lowlink']
);
} else if ($child_id != $parent) { // if already visited and not parent
// assess lowpoint
$nodes[$id]['lowlink'] = min(
$nodes[$id]['lowlink'],
$nodes[$child_id]['index']
);
}
}
// was this a root node?
if ($nodes[$id]['lowlink'] == $nodes[$id]['index']) {
do {
$w = array_pop($stack);
$scc[] = $w;
} while($id != $w);
// record results in table
pg_query("
INSERT INTO temp_nodes (node, component)
VALUES (".implode(','.$component_index.'),(',$scc).",$component_index)
");
$component_index++;
}
return NULL;
}
?>
Upvotes: 1
Reputation: 44250
IMO it is not possible without loop-detection. (the dangling bit is a kind of breadcrum-loopdetection). The below query is a forking Y-shape leading into two dead-end-streets (1..4 and 11..14). If you add the link between #19 back to #15, the recursion will not stop. (Maybe my logic is incorrect or incomplete?)
DROP SCHEMA tmp CASCADE;
CREATE SCHEMA tmp ;
SET search_path=tmp;
CREATE TABLE edge_table
( source INTEGER NOT NULL
, target INTEGER NOT NULL
, dangling boolean NOT NULL DEFAULT False
);
INSERT INTO edge_table ( source, target) VALUES
(1,2) ,(2,3) ,(3,4)
,(11,12) ,(12,13) ,(13,14)
,( 15,16) ,(16,17) ,(17,18) ,( 18,19)
-- , (19,15) -- this will close the loop
, (19,1) -- Y-fork
, (19,11) -- Y-fork
;
-- EXPLAIN
WITH RECURSIVE cul AS (
SELECT e0.source AS source
, e0.target AS target
FROM edge_table e0
WHERE NOT EXISTS ( -- no way out ...
SELECT * FROM edge_table nx
WHERE nx.source = e0.target
)
UNION ALL
SELECT e1.source AS source
, e1.target AS target
FROM edge_table e1
JOIN cul ON cul.source = e1.target
WHERE 1=1
AND NOT EXISTS ( -- Only one incoming link; no *other* way to cul
SELECT * FROM edge_table nx
WHERE nx.target = cul.source
AND nx.source <> e1.source
)
)
SELECT * FROM cul
;
[ the CTE is of course intended to be used in an update statement to set the dangling fields ]
Upvotes: 0