MCO
MCO

Reputation: 314

Group by and Select Distinct in SQL Server

What are the differences between the following two queries?

SELECT distinct(Invalid_Emails), [leads_id] 
FROM [dbo].[InvalidEmails_stg] 
ORDER BY LEADS_ID DESC

vs

select invalid_emails, max(leads_id) as id 
from invalidEmails_stg 
group by invalid_emails 
having count(*) < 2 
order by id desc

The second one gave me fewer rows than the first.

Upvotes: 0

Views: 70

Answers (3)

Gordon Linoff
Gordon Linoff

Reputation: 1270873

You are confused by the parentheses in the first query. They are doing nothing, so write the query as:

SELECT DISTINCT Invalid_Emails, leads_id 
FROM [dbo].[InvalidEmails_stg]
ORDER BY LEADS_ID DESC;

This returns all pairs of Invalid_Emails/Leads_id that appear in the database. No matter how many times a given pair appears, it will be in the result set exactly one time.

This query:

select invalid_emails, max(leads_id) as id 
from invalidEmails_stg 
group by invalid_emails 
having count(*) < 2 
order by id desc;

Returns invalid_emails/leads_id pairs that occur only once in your data. It filters out any pairs that occur more than once.

Here is a simple example:

invalid_emails   leads_id
    [email protected]         1
    [email protected]         1
    [email protected]         2
    [email protected]         3
    [email protected]         1

The first query will return:

    [email protected]         1
    [email protected]         2
    [email protected]         3
    [email protected]         1

[email protected] is returned once because duplicates are removed.

The second will return:

    [email protected]         2
    [email protected]         3
    [email protected]         1

[email protected] is not returned because it appears twice.

Upvotes: 2

Snuka
Snuka

Reputation: 11

The queries have similar intent, to get a invalid_emails by leads_id.

The 2nd query uses aggregate functions to only bring back the maximum leads_id, and uses a having clause to remove duplicates.

Upvotes: 0

mhd.cs
mhd.cs

Reputation: 721

In first query

SELECT distinct(Invalid_Emails),[leads_id] 
FROM [dbo].[InvalidEmails_stg] 
ORDER BY LEADS_ID DESC

you dont Check Constraint < 2

Actually in Second query :

select invalid_emails, max(leads_id) as id 
from invalidEmails_stg 
group by invalid_emails 
having count(*)<2 
order by id desc

if result Contain two or more than row Having Count(*) Filter Your Result .

another diffrence is NULL value . if Column Invalid_Emails having Null Value Appear in First Query and Filter By group by in Next Query

Upvotes: 1

Related Questions