Aboud Zakaria
Aboud Zakaria

Reputation: 567

Remove duplicated data complex sql query

I have the following complex query

SELECT TOP 20
    [image].Image_id, Source_Link
FROM 
    Item
INNER JOIN 
    [Image] ON Item.Image_id = [Image].Image_id
INNER JOIN 
    Age_Item ON Item.Item_id = Age_Item.Item_id
WHERE 
    Gender = 'male'
    AND Age_Item.Age_id = 4
    AND [Image].Image_id NOT IN
        (SELECT [Image].Image_id
         FROM [Image]
         INNER JOIN Item on [Image].Image_id = Item.Image_id
         INNER JOIN Session_Item on Item.Item_id = Session_Item.Item_id
         WHERE Session_Item.Session_id = 3)
ORDER BY 
    NEWID() -- random rows

however, I realized that sometimes it shows some duplicated results and I wanted to remove the duplicated rows

I tried SELECT DISTINCT but it shows this error :

ORDER BY items must appear in the select list if SELECT DISTINCT is specified.

and I can't use GROUP BY([Image].Image_id) in this case because the column [Image].Source_Link is not contained in an aggregate function.

What should I do to fix this issue?

//EDIT: I've fixed it that way, thanks for your answers guys

SELECT TOP 20 * FROM
(
SELECT DISTINCT [image].Image_id, Source_Link
FROM Item
INNER JOIN [Image] ON Item.Image_id = [Image].Image_id
INNER JOIN Age_Item ON Item.Item_id = Age_Item.Item_id
WHERE Gender='male'
AND Age_Item.Age_id = 4
AND [Image].Image_id NOT IN
(
    SELECT [Image].Image_id
    FROM [Image]
    INNER JOIN Item on [Image].Image_id = Item.Image_id
    INNER JOIN Session_Item on Item.Item_id = Session_Item.Item_id
    WHERE Session_Item.Session_id = 3
)
)query ORDER BY NEWID();

Upvotes: 0

Views: 75

Answers (2)

Conrad Frix
Conrad Frix

Reputation: 52645

Since you want a random 20 distinct and order randomly you'll need to get a distinct set and then apply your TOP 20 ORDER BY NEW_ID after.

You can do that with a CTE

with cte as (

SELECT DISTINCT [image].Image_id, Source_Link
FROM Item
INNER JOIN [Image] ON Item.Image_id = [Image].Image_id
INNER JOIN Age_Item ON Item.Item_id = Age_Item.Item_id
WHERE Gender='male'
AND Age_Item.Age_id = 4
AND [Image].Image_id NOT IN
(
    SELECT [Image].Image_id
    FROM [Image]
    INNER JOIN Item on [Image].Image_id = Item.Image_id
    INNER JOIN Session_Item on Item.Item_id = Session_Item.Item_id
    WHERE Session_Item.Session_id = 3
)
)
Select TOP 20 DISTINCT * FROM CTE ORDER BY NEWID() 

Upvotes: 1

Richard Hansell
Richard Hansell

Reputation: 5403

SELECT DISTINCT * FROM ( <your original query> ) query;

I just tried the following:

SELECT DISTINCT * FROM (
SELECT TOP 20 [image].Image_id, Source_Link
FROM Item
INNER JOIN [Image] ON Item.Image_id = [Image].Image_id
INNER JOIN Age_Item ON Item.Item_id = Age_Item.Item_id
WHERE Gender='male'
AND Age_Item.Age_id = 4
AND [Image].Image_id NOT IN
(
    SELECT [Image].Image_id
    FROM [Image]
    INNER JOIN Item on [Image].Image_id = Item.Image_id
    INNER JOIN Session_Item on Item.Item_id = Session_Item.Item_id
    WHERE Session_Item.Session_id = 3
)
ORDER BY NEWID()) query;

...and it compiled fine for me.

Upvotes: 1

Related Questions