Reputation: 1956
I currently have 2 tables:
CrawlData
id (autoincrement), Source, Destination, and some more columns
Nodes
id (autoincrement), URL
The Nodes table contains the distinct Source values from CrawlData
Now I would like to have a table that is a kind of look up table that contains the ID's from Nodes instead of the texts in Source and Destination from CrawlData
I can get all the ID's with a Select query using Join on the URL=Source and URL = Destination, but don't know how to combine these and then also to get them in a new table Edges with 2 columns:
Upvotes: 1
Views: 1966
Reputation: 263703
You can INSERT
the records returned by SELECT
statement using INSERT INTO...SELECT
statement.
INSERT INTO Edges(SourceNode, DestinationNode)
SELECT b.ID SourceNode,
c.ID DestinationNode
FROM CrawlData a
INNER JOIN Nodes b
ON a.Source = b.URL
INNER JOIN Nodes c
ON a.Destination = c.URL
To further gain more knowledge about joins, kindly visit the link below:
For faster execution, execute the following statements to add INDEX
on the columns to avoid FULL TABLE SCAN
which could be slow if doing on large RDBMS.
ALTER TABLE Nodes ADD INDEX (URL);
if happens that all values of Source
and Destination
column are present on Nodes.URL
, declare these columns as foreign keys,
ALTER TABLE CrawlData
ADD CONSTRAINT cd_fk1 FOREIGN KEY (Source) REFERENCES Nodes(URL)
ALTER TABLE CrawlData
ADD CONSTRAINT cd_fk2 FOREIGN KEY (Destination) REFERENCES Nodes(URL)
otherwise, add normal index on them
ALTER TABLE CrawlData ADD INDEX (Source);
ALTER TABLE CrawlData ADD INDEX (Destination);
Upvotes: 2
Reputation: 2937
You can join twice to the Nodes table. Once, use the Source to join to URLK. The next time use the Destination.
Conceptually, it is like using two copies of the Nodes table, each with a different name (say "S" and "D"). You get:
select S.ID As SOURCE_ID, D.ID As DEST_ID
from CrawlData
join Nodes S on Source = S.URL
join Nodes D on Destination = D.URL
Upvotes: 1