Rominus
Rominus

Reputation: 1221

Why does this SQL query need DISTINCT?

I've written a query to filter a table based on criteria found in a master table, and then remove rows that match a third table. I'm executing the query in Access, so I can't use MINUS. It works, but I found that it returns duplicate rows for some, but not all, of the selected records. I fixed it with DISTINCT, but I don't know why it would return duplicates in the first place. It's a pretty simple query:

select distinct sq.*
from 
    (select List_to_Check.*, Master_List.SELECTION_VAR
        from List_to_Check
        left join Master_List
        on List_to_Check.SUB_ID = Master_List.SUB_ID
        where Master_List.SELECTION_VAR = 'criteria'
    ) as sq 
left join List_to_Exclude
on sq.SUB_ID = List_to_Exclude.SUB_ID
where List_to_Exclude.SUB_ID is null
;

Edit: The relationships between all three tables are 1-to-1 on the SUB_ID var. Combined with using a LEFT JOIN, I would expect one line per ID.

Upvotes: 0

Views: 128

Answers (2)

JeffO
JeffO

Reputation: 8043

I recommend breaking your query apart and checking for duplicates. My guess is that it's your data/ the sub_ID isn't very unique.

Start with you sub query since you're returning all of those columns. If you get duplicates there, your query is going to return duplicates regardless of what is in your exclusion table.

Once you have those duplicates cleared up, check the exclusion table for duplicate sub_Id.

To save time in trouble-shooting, if there are known culprits that are duplicates, you may want to limit the returned values, so you can focus on the peculiarities of those data.

I'm not sure this is a problem, but look into the logic on

on List_to_Check.SUB_ID = 
    Master_List.SUB_ID
where Master_List.SELECTION_VAR = 'criteria'

Where clauses on data in the right side of a left outer join may not be returning the data you expect. Try this and see what happens:

on List_to_Check.SUB_ID = Master_List.SUB_ID
and Master_List.SELECTION_VAR = 'criteria'

Upvotes: 1

Amit Mahajan
Amit Mahajan

Reputation: 915

The inner query joins List_to_Check and master but the outer query joins List_to_Exclude with Subscriber(maybe you can change the names i call these 3 tables)

To avoid duplicates you need to use one of the table in both the queries inner and outer. This will avoid duplicates.

Upvotes: 0

Related Questions