Reputation: 23
I am using T-SQL to return records from the database (there are multiple criteria, but the list of unique ID's must be distinct), in short, the T-SQL looks like this:
SELECT
t1.ID,
[query1mark] = 1
WHERE criteria1 = 1
UNION
SELECT
t1.ID,
[query2mark] = 1
WHERE criteria2 = 1
I would like to be able to use Union to de-dupe on the ID field (the data has to be unique on the ID field), whilst retaining the derived column "query1mark" or "query2mark" to highlight which query it additionally came from. In my real world case, there are 5 queries that need to be de-duped against each other, so I need an efficient solution is possible.
EDIT: Additionally, the results from the first query need to be prioritised over those from the second query, and the results from the second query need to be prioritised over those from the third query, as I understand, this feature is inherent when using Union, as it will only add records from below the Union statement.
Is Union the best solution for this, and if not, what can I use?
Thanks
Upvotes: 0
Views: 125
Reputation: 13959
you can use top 1 with ties
SELECT top 1 with ties * FROM yourtable
ORDER BY ROW_NUMBER() OVER (PARTITION BY [ID] ORDER BY [criteria])
Upvotes: 0
Reputation: 43646
What about this:
DECLARE @DataSource TABLE
(
[ID] INT
,[criteria] INT
);
INSERT INTO @DataSource ([ID], [criteria])
VALUES (1, 1)
,(1, 2)
,(2, 1)
,(3, 1)
,(3, 2)
,(4, 2);
WITH DataSource ([ID], [query_mark], [RowID]) AS
(
SELECT *
,ROW_NUMBER() OVER (PARTITION BY [ID] ORDER BY [criteria] ASC)
FROM @DataSource
)
SELECT [id], [query_mark]
FROM DataSource
WHERE [RowID] = 1;
The idea is to create sequence of all duplicated elements for particular group. The duplicates are order by the criteria
field, but yo can change the logic if you need - for example to show the biggest criteria
. The group is defined using the PARTITION BY [ID]
statement, which means, order items for each [ID]
group. Then, in the select, we only need to show one record per each group [RowID] = 1
Upvotes: 1