Jim Billig
Jim Billig

Reputation: 344

SQL return only distinct IDs from LEFT JOIN

I've inherited some fun SQL and am trying to figure out how to how to eliminate rows with duplicate IDs. Our indexes are stored in a somewhat columnar format and then we pivot all the rows into one with the values as different columns.

The below sample returns three rows of unique data, but the IDs are duplicated. I need just two rows with unique IDs (and the other columns that go along with it). I know I'll be losing some data, but I just need one matching row per ID to the query (first, top, oldest, newest, whatever).

I've tried using DISTINCT, GROUP BY, and ROW_NUMBER, but I keep getting the syntax wrong, or using them in the wrong place.

I'm also open to rewriting the query completely in a way that is reusable as I currently have to generate this on the fly (cardtypes and cardindexes are user defined) and would love to be able to create a stored procedure. Thanks in advance!

declare @cardtypes table ([ID] int, [Name] nvarchar(50))
declare @cards table ([ID] int, [CardTypeID] int, [Name] nvarchar(50))
declare @cardindexes table ([ID] int, [CardID] int, [IndexType] int, [StringVal] nvarchar(255), [DateVal] datetime)

INSERT INTO @cardtypes VALUES (1, 'Funny Cards')
INSERT INTO @cardtypes VALUES (2, 'Sad Cards')

INSERT INTO @cards VALUES (1, 1, 'Bunnies')
INSERT INTO @cards VALUES (2, 1, 'Dogs')
INSERT INTO @cards VALUES (3, 1, 'Cat')
INSERT INTO @cards VALUES (4, 1, 'Cat2')

INSERT INTO @cardindexes VALUES (1, 1, 1, 'Bunnies', null)
INSERT INTO @cardindexes VALUES (2, 1, 1, 'playing', null)
INSERT INTO @cardindexes VALUES (3, 1, 2, null, '2014-09-21')
INSERT INTO @cardindexes VALUES (4, 2, 1, 'Dogs', null)
INSERT INTO @cardindexes VALUES (5, 2, 1, 'playing', null)
INSERT INTO @cardindexes VALUES (6, 2, 1, 'poker', null)
INSERT INTO @cardindexes VALUES (7, 2, 2, null, '2014-09-22')


SELECT TOP(100)
    [ID] = c.[ID],
    [Name] = c.[Name],
    [Keyword] = [colKeyword].[StringVal],
    [DateAdded] = [colDateAdded].[DateVal]
FROM @cards AS c
LEFT JOIN @cardindexes AS [colKeyword] ON [colKeyword].[CardID] = c.ID AND [colKeyword].[IndexType] = 1
LEFT JOIN @cardindexes AS [colDateAdded] ON [colDateAdded].[CardID] = c.ID AND [colDateAdded].[IndexType] = 2
WHERE [colKeyword].[StringVal] LIKE 'p%' AND c.[CardTypeID] = 1
ORDER BY [DateAdded]

Edit:

While both solutions are valid, I ended up using the MAX() solution from @popovitsj as it was easier to implement. The issue of data coming from multiple rows doesn't really factor in for me as all rows are essentially part of the same record. I will most likely use both solutions depending on my needs.

Here's my updated query (as it didn't quite match the answer):

SELECT TOP(100)
    [ID] = c.[ID],
    [Name] = MAX(c.[Name]),
    [Keyword] = MAX([colKeyword].[StringVal]),
    [DateAdded] = MAX([colDateAdded].[DateVal])
FROM @cards AS c
LEFT JOIN @cardindexes AS [colKeyword] ON [colKeyword].[CardID] = c.ID AND [colKeyword].[IndexType] = 1
LEFT JOIN @cardindexes AS [colDateAdded] ON [colDateAdded].[CardID] = c.ID AND [colDateAdded].[IndexType] = 2
WHERE [colKeyword].[StringVal] LIKE 'p%' AND c.[CardTypeID] = 1
GROUP BY c.ID
ORDER BY [DateAdded]

Upvotes: 0

Views: 566

Answers (2)

wvdz
wvdz

Reputation: 16651

You could use MAX or MIN to 'decide' on what to display for the other columns in the rows that are duplicate.

SELECT ID, MAX(Name), MAX(Keyword), MAX(DateAdded)
(...)
GROUP BY ID;

Upvotes: 2

Ryan B.
Ryan B.

Reputation: 3665

using row number windowed function along with a CTE will do this pretty well. For example:

;With preResult AS (
SELECT TOP(100)
    [ID] = c.[ID],
    [Name] = c.[Name],
    [Keyword] = [colKeyword].[StringVal],
    [DateAdded] = [colDateAdded].[DateVal],
    ROW_NUMBER()OVER(PARTITION BY c.ID ORDER BY [colDateAdded].[DateVal]) rn
FROM @cards AS c
LEFT JOIN @cardindexes AS [colKeyword] ON [colKeyword].[CardID] = c.ID AND [colKeyword].[IndexType] = 1
LEFT JOIN @cardindexes AS [colDateAdded] ON [colDateAdded].[CardID] = c.ID AND [colDateAdded].[IndexType] = 2
WHERE [colKeyword].[StringVal] LIKE 'p%' AND c.[CardTypeID] = 1
ORDER BY [DateAdded]
)

SELECT * from preResult WHERE rn = 1

Upvotes: 2

Related Questions