Reputation: 93
I'm creating a joined view of two tables, but am getting unwanted duplicates from table2.
For example: table1 has 9000 records and I need the resulting view to contain exactly the same; table2 may have multiple records with the same FKID but I only want to return one record (random chosen is ok with my customer). I have the following code that works correctly, but performance is slower than desired (over 14 seconds).
SELECT
OBJECTID
, PKID
,(SELECT TOP (1) SUBDIVISIO
FROM dbo.table2 AS t2
WHERE (t1.PKID = t2.FKID)) AS ProjectName
,(SELECT TOP (1) ASBUILT1
FROM dbo.table2 AS t2
WHERE (t1.PKID = t2.FKID)) AS Asbuilt
FROM dbo.table1 AS t1
Is there a way to do something similar with joins to speed up performance?
I'm using SQL Server 2008 R2.
I got close with the following code (~.5 seconds), but 'Distinct' only filters out records when all columns are duplicate (rather than just the FKID).
SELECT
t1.OBJECTID
,t1.PKID
,t2.ProjectName
,t2.Asbuilt
FROM dbo.table1 AS t1
LEFT JOIN (SELECT
DISTINCT FKID
,ProjectName
,Asbuilt
FROM dbo.table2) t2
ON t1.PKID = t2.FKID
table examples
table1 table2
OID, PKID FKID, ProjectName, Asbuilt
1, id1 id1, P1, AB1
2, id2 id1, P5, AB5
3, id4 id2, P10, AB2
5, id5 id5, P4, AB4
In the above example returned records should be id5/P4/AB4, id2/P10/AB2, and (id1/P1/AB1 OR id1/P5/AB5)
My search came up with similar questions, but none that resolved my problem. link, link
Thanks in advance for your help. This is my first post so let me know if I've broken any rules.
Upvotes: 9
Views: 52941
Reputation: 7092
If you want described result, you need to use INNER JOIN
and following query will satisfy your need:
SELECT
t1.OID,
t1.PKID,
MAX(t2.ProjectName) AS ProjectName,
MAX(t2.Asbuilt) AS Asbuilt
FROM table1 t1
JOIN table2 t2 ON t1.PKID = t2.FKID
GROUP BY
t1.OID,
t1.PKID
If you want to see all rows from left table (table1) whether it has pair in right table or not, then use LEFT JOIN
and same query will gave you desired result.
EDITED
This construction has good performance, and you dont need to use subqueries.
Upvotes: 1
Reputation: 56725
This will give the results you requested and should have the best performance.
SELECT
OBJECTID
, PKID
, t2.SUBDIVISIO,
, t2.ASBUILT1
FROM dbo.table1 AS t1
OUTER APPLY (
SELECT TOP 1 *
FROM dbo.table2 AS t2
WHERE t1.PKID = t2.FKID
) AS t2
Upvotes: 14
Reputation: 1269613
Your original query is producing arbitrary values for the two columns (the use of top
with no order by
). You can get the same effect with this:
SELECT t1.OBJECTID, t1.PKID, t2.ProjectName, t2.Asbuilt
FROM dbo.table1 t1 LEFT JOIN
(SELECT FKID, min(ProjectName) as ProjectName, MIN(asBuilt) as AsBuilt
FROM dbo.table2
group by fkid
) t2
ON t1.PKID = t2.FKID
This version replaces the distinct
with a group by
.
To get a truly random row in SQL Server (which your syntax suggests you are using), try this:
SELECT t1.OBJECTID, t1.PKID, t2.ProjectName, t2.Asbuilt
FROM dbo.table1 t1 LEFT JOIN
(SELECT FKID, ProjectName, AsBuilt,
ROW_NUMBER() over (PARTITION by fkid order by newid()) as seqnum
FROM dbo.table2
) t2
ON t1.PKID = t2.FKID and t2.seqnum = 1
This assumes version 2005 or greater.
Upvotes: 3