SQL join table on top value only

I have two tables (Temp1 & Temp2 in this example) that join together. Some IDs in Temp1 have multiple ParentIDs that link to Temp2, but I only want 1 ParentID to actually return. It doesn't matter which one, so let's just say the top value ID in Temp2.

The following code provides a good example of where I'm starting

IF OBJECT_ID('tempdb..#Temp1') IS NOT NULL
    DROP TABLE #Temp1

IF OBJECT_ID('tempdb..#Temp2') IS NOT NULL
    DROP TABLE #Temp2

CREATE TABLE #Temp1
(
    ID INT
    ,ParentID INT
    ,Name VARCHAR(10)
)

CREATE TABLE #Temp2
(
    ID INT
    ,Name VARCHAR(15)
)

INSERT #Temp1 VALUES
(1, 1, 'Apples'),
(2, 1, 'Bananas'),
(3, 2, 'Milk'),
(3, 3, 'Milk'),
(4, 3, 'Wine'),
(4, 4, 'Wine')

INSERT #Temp2 VALUES
(1, 'Fruit'),
(2, 'Dairy'),
(3, 'Beverages'),
(4, 'Beer & Wine')

SELECT
T1.ID [ChildID]
,T1.Name [ChildName]
,T2.ID [ParentID]
,T2.Name [ParentName]
FROM #Temp1 T1
INNER JOIN #Temp2 T2 ON T2.ID = T1.ParentID

The desired result I'm looking for would be:

ChildID     ChildName   ParentID    ParentName
1           Apples      1           Fruit
2           Bananas     1           Fruit
3           Milk        3           Beverages
4           Wine        4           Beer & Wine

I've seen other examples like this where people use CROSS APPLY, but I just can't seem to get it to work. Any assistance is appreciated!

Upvotes: 2

Views: 38

Answers (3)

Zohar Peled
Zohar Peled

Reputation: 82474

One way is to join the #Temp2 table to a cte that is a result of a group by query on #Temp1, that contains a single row for each combination of id and name, with the min (or max) parent id:

;WITH CTETemp1 AS
(
    SELECT ID, Min(ParentID) As PID, Name
    FROM #Temp1
    GROUP BY ID, Name
)

SELECT   T1.ID [ChildID]
        ,T1.Name [ChildName]
        ,T2.ID [ParentID]
        ,T2.Name [ParentName]
FROM CTETemp1 AS T1 
JOIN #Temp2 AS T2 ON T1.PID = T2.ID

BTW, this kind of relationship is called a many to many relationship and it should be implemented by adding a new table to link the ids of #temp1 and #temp2 together - so a better schema would look like this:

CREATE TABLE #Temp1
(
    ID INT
    ,Name VARCHAR(10)
)

CREATE TABLE #Temp2
(
    ID INT
    ,Name VARCHAR(15)
)

CREATE TABLE #Temp1ToTemp2
(
    ID1,
    ID2,
    PRIMARY KEY(ID1, ID2)
)

INSERT #Temp1 VALUES
(1, 'Apples'),
(2, 'Bananas'),
(3, 'Milk'),
(4, 'Wine'),

INSERT #Temp2 VALUES
(1, 'Fruit'),
(2, 'Dairy'),
(3, 'Beverages'),
(4, 'Beer & Wine')

INSERT INTO #Temp1ToTemp2 (ID1, ID2) VALUES
(1,1), (2, 1), (3, 2), (3, 3), (4, 3), (4, 4)

The key benefit is that now you do not have duplicate data stored in your database.

Upvotes: 1

Kamil Gosciminski
Kamil Gosciminski

Reputation: 17147

Your structure breaks database normal form. You clearly have many to many relationship. I suggest that you redesign your table structure. As you have it right now, table #Temp1 does not store information about a product in each row, but instead each row presents a relation to #Temp2.

There should be only one row for each id in #Temp1. For simplicity I'm omitting primary and foreign key constraints:

CREATE TABLE #Temp1 (
  ID INT,
  Name VARCHAR(10)
)

CREATE TABLE #Temp2 (
  ID INT,
  Name VARCHAR(15)
)

CREATE TABLE #TempRelation (
  ID_Temp1 INT,
  ID_Temp2 INT
)

Key concept is that you store each relation between temp1 and temp2 in a separate table #TempRelation.

To populate these tables you need to change your INSERT statements into:

INSERT #Temp1 VALUES
(1, 'Apples'),
(2, 'Bananas'),
(3, 'Milk'),
(4, 'Wine')

INSERT #TempRelation VALUES
(1,1),
(2,1),
(3,2),
(3,3),
(4,3),
(4,4)

Then your query is as simple as this:

SELECT
  tr.id_temp1 as childid,
  t1.name as childname,
  tr.id_temp2 as parentid,
  t2.name as parentname,
FROM (
  SELECT id_temp1, max(id_temp2) as id_temp2
  FROM #TempRelation tr
  GROUP BY id_temp1
) tr
JOIN #Temp1 t1 ON tr.id_temp1 = t1.id
JOIN #Temp2 t2 ON tr.id_temp2 = t2.id

Upvotes: 0

Jayasurya Satheesh
Jayasurya Satheesh

Reputation: 8033

Change your Final Select as below

;WITH CTE
AS
(
    SELECT
    RN = ROW_NUMBER() OVER(PARTITION BY T1.ID ORDER BY T1.Name),
    T1.ID [ChildID]
    ,T1.Name [ChildName]
    ,T2.ID [ParentID]
    ,T2.Name [ParentName]
    FROM #Temp1 T1
    INNER JOIN #Temp2 T2 ON T2.ID = T1.ParentID
)
SELECT
    [ChildID],
    [ChildName],
    [ParentID],
    [ParentName]
    FROM CTE
        WHERE RN = 1

Upvotes: 1

Related Questions