Reputation: 2750
I am trying to find table links using duplicate column names. Say i have the following tables
T1:
| Prod_ID | Cust_Id | Value |
| P1 | C1 | 1 |
| P2 | C2 | 2 |
| P3 | C3 | 3 |
| P4 | C4 | 4 |
| P5 | C5 | 5 |
T2:
| Prod_ID | Prod_Num |
| P1 | PN1 |
| P2 | PN2 |
| P3 | PN3 |
| P4 | PN4 |
| P5 | PN5 |
I rely on system tables to fetch table information. The data looks like
| tabname | colname |
| T1 | Prod_ID |
| T1 | Cust_Id |
| T1 | Value |
| T2 | Prod_ID |
| T2 | Prod_Num |
| T3 | .... |
If i want to find all tables with columns Prod_ID and Cust_ID, i could do the same using
SELECT tabname, count(*)
FROM syscat.columns
WHERE colname IN ('Prod_ID', 'Cust_Id')
GROUP BY tabname
HAVING count(*) > 1
Now, when i want to find how two columns across tables are linked, the query is getting complex.
For example: To find how Cust_Id and Prod_Num are linked, the expected output would be something like
| tabname | colname |
| T1 | Cust_id |
| T1 | Prod_id |
| T2 | Prod_id |
| T2 | Prod_Num |
Suggesting that Prod_Id is contained in both tables and can be used to map Cust_Id and Prod_num. Is there a script for getting something like above?
Upvotes: 1
Views: 44
Reputation: 10701
I would use self-joins for that.
SELECT c1.tabname, c2.colname joinCol, c3.tabname
FROM syscat.columns c1
JOIN syscat.columns c2 ON c1.tabname = c2.tabname
JOIN syscat.columns c3 ON c3.tabname != c2.tabname and c3.colname = c2.colname
JOIN syscat.columns c4 ON c4.tabname = c3.tabname and c3.colname = c2.colname
WHERE c1.colname = 'Cust_Id' and c4.colname = 'Prod_Num'
The output is the following:
tabname joinCol tabname
---------------------------
T1 Prod_id T2
which means that table t1
is joined with t2
using prod_id
(cust_id
and prod_num
are on the input, therefore there is no need to have them on the output)
demo - it is SQL Server, however, JOIN
will work in DB2 as well ;)
Upvotes: 1