Reputation: 105
I've got a SQL server table in which each row represents an edge in a graph network. The FromNodeID and ToNodeID are foreign keys to a node table, and the schema is something like this:
CREATE TABLE #Edges (
EdgeID int identity (1,1),
FromNodeID int,
ToNodeID int
);
INSERT INTO #Edges (FromNodeID, ToNodeID) VALUES
(1,2),
(1,3),
(1,4),
(2,3),
(3,5),
(4,5),
(5,6);
Now, if I consider each edge to be directed (i.e., one way), then it's easy to work out all those nodes that I can get to directly from any node. I'd add an index to the FromNodeID column, and then run a query like this:
SELECT ToNodeID FROM #Edges WHERE FromNodeID = 3
Result: 5
But what would be the best way to structure my table/query if I want to treat each edge as unidirectional. i.e. starting from node 3, I'd like to get the results:
Result: 1, 2, 5
The simplest way I can think of would be to add an additional index to the ToNodeID column and then run a query like this:
SELECT ToNodeID FROM #Edges WHERE FromNodeID = 3
UNION SELECT FromNodeID FROM #Edges WHERE ToNodeID = 3;
But this obviously involves combining result sets from two queries and doesn't seem very efficient - is there a better way to write this in a single query? (Note that I don't want to insert the reversed edges again into the table - I need to be able to treat the edges as either directed or undirected at runtime).
Thanks for any advice!
Upvotes: 6
Views: 4575
Reputation: 425863
But this obviously involves combining result sets from two queries and doesn't seem very efficient - is there a better way to write this in a single query?
This is efficient enough.
You could do this:
SELECT CASE 3 WHEN FromNodeId THEN ToNodeId ELSE FromNodeId END
FROM Edges
WHERE 3 IN (FromNodeId, ToNodeId)
but this will be essentially the same (will UNION
these indexes under the hood).
Here's a script to test:
CREATE TABLE #Edges
(
EdgeID INT IDENTITY (1,1) PRIMARY KEY,
FromNodeID int NOT NULL,
ToNodeID int NOT NULL
)
CREATE INDEX ix_edges_from ON #Edges (FromNodeID, ToNodeId)
CREATE INDEX ix_edges_to ON #Edges (ToNodeID, FromNodeId)
;
WITH q (rn) AS
(
SELECT 1
UNION ALL
SELECT rn + 1
FROM q
WHERE rn < 1000
)
INSERT
INTO #Edges (FromNodeId, ToNodeId)
SELECT q1.rn, q2.rn
FROM q q1
CROSS JOIN
q q2
WHERE (q1.rn + q2.rn) % 37 = 0
OPTION (MAXRECURSION 0)
For the UNION
:
SELECT ToNodeId
FROM #Edges
WHERE FromNodeId = 3
UNION
SELECT FromNodeId
FROM #Edges
WHERE ToNodeId = 3
|--Stream Aggregate(GROUP BY:([Union1006]))
|--Merge Join(Concatenation)
|--Index Seek(OBJECT:([tempdb].[dbo].[#Edges]), SEEK:([tempdb].[dbo].[#Edges].[FromNodeID]=(3)) ORDERED FORWARD)
|--Index Seek(OBJECT:([tempdb].[dbo].[#Edges]), SEEK:([tempdb].[dbo].[#Edges].[ToNodeID]=(3)) ORDERED FORWARD)
For the IN
:
|--Compute Scalar(DEFINE:([Expr1003]=CASE WHEN (3)=[tempdb].[dbo].[#Edges].[FromNodeID] THEN [tempdb].[dbo].[#Edges].[ToNodeID] ELSE [tempdb].[dbo].[#Edges].[FromNodeID] END))
|--Sort(DISTINCT ORDER BY:([tempdb].[dbo].[#Edges].[EdgeID] ASC))
|--Concatenation
|--Index Seek(OBJECT:([tempdb].[dbo].[#Edges]), SEEK:([tempdb].[dbo].[#Edges].[FromNodeID]=(3)) ORDERED FORWARD)
|--Index Seek(OBJECT:([tempdb].[dbo].[#Edges]), SEEK:([tempdb].[dbo].[#Edges].[ToNodeID]=(3)) ORDERED FORWARD)
As you can see, the plans are essentially the same: they both take values from the corresponding indexes and concatenate the results.
The UNION
query is in fact a little more efficient, since it uses a Merge Join
to concatenate the results, and the records come out of the merge join naturally ordered, so the Stream Aggregate
does not need to sort.
Upvotes: 5
Reputation: 77450
There are three options I can think of: do it only in the table, only in the queries, or create a view. For the table, create triggers that enforce the symmetric closure (e.g. when inserting (a,b), also insert (b,a); when updating (a,b) to (c,d), delete the old symmetry-preserving (b,a) pair, then insert (d,c)). Note that this may not work, as some RDBMSs (I'm not sure if SQL Server is one) don't allow insertions/updates to the table that a trigger fired on.
In queries,
SELECT CASE FromNodeID WHEN 3 THEN ToNodeId ELSE FromNodeId END
FROM #Edges
WHERE FromNodeID=3 OR ToNodeID=3
For the view, create one that's the symmetric closure of the original table. I think you'd still have to use a UNION, but it could simplify query writing.
CREATE VIEW undirected_edges (FirstEdgeID,SecondEdgeId)
AS (SELECT FromNodeID, ToNodeID FROM #Edges)
UNION DISTINCT
(SELECT ToNodeID, FromNodeID FROM #Edges)
Upvotes: 0
Reputation: 4080
Must you process the graph directly from SQL Server? If you are really concerned about performance, you should use one of the data-structures specifically for representing and processing graphs. Most of the work I have done with graphs (and I have done plenty) would have been infeasible if I used a generic database backend to consult the graphs.
One of the most effective representations that I have used is described in the appendices of a compiler book I have: Engineering a Compiler, by Keith Cooper and Linda Torczon.
Upvotes: 1