Reputation: 1341

Group by and having trouble understanding

I was looking at some SQL query that I have in Access database that I did not make.

One of the SQL query goes something like this:

select column1 from table1 group by column1 having count(*)>1

The purpose of this query is to find the value in column1 that appears more than once. I can verify that this query works correctly and returns the column value that appears more than once.

I however do not understand why this query works. As per my understanding using group by will remove duplicate fields. For instance if column1 had

    column1
    apple
    mango
    mango

Doing group by (column1) will result

    column1
    apple
    mango

At this point, if we perform having count(*)>1 or having count(column1)>1, it should return no result because group by has already removed the duplicate field. But clearly, I am wrong as the above SQL statement does give the accurate result.

Would you please let me know the problem in my understanding?

Edit 1:

Besides the accepted answer, I this article which deals with order of SQL operation really helped my understanding

Upvotes: 4

Answers (3)

Gordon Linoff

Reputation: 1269773

You are misunderstanding how HAVING works. In fact, you can think of it by using subqueries. Your query is equivalent to:

select column1
from (select column1, count(*) as cnt
      from table1
      group by column1
     ) as t
having cnt > 1;

That is, having filters an aggregation query after the aggregation has taken place. However, the aggregation functions are applied per group. So count(*) is counting the number of rows in each group. That is why it is identifying duplicates.

Upvotes: 2

Angel M.

Reputation: 1358

GROUP BY clause groups the selection per the fields you mention, on this case column1 but can be a combined column (e.g. column1, column2). By the way, I think if you run:

SELECT column1, Count(*) AS [Count], MIN(column2) AS MinColumn2, MAX(column2) AS MaxColumn2
FROM table1 
GROUP BY column1;

Will help you to understand how GROUPING works. When filter by any column directly you may use the WHERE condition but if you want to filter per any field calculated from the grouping you need to use the HAVING clause.

Upvotes: 1

Mureinik

Reputation: 311308

group by doesn't just remove duplicate values - it returns one row per distinct value of the group by clause, and allows you apply aggregate function per such unique value.

In this query, you actually query the values of column1 and the result of count(*) per value of column1, then, you use the having clause to return only the values of column1 that have a count(*) greater than 1.

Upvotes: 2

Group by and having trouble understanding

Answers (3)

Related Questions