Reputation: 1341
I was looking at some SQL query that I have in Access database that I did not make.
One of the SQL query goes something like this:
select column1 from table1 group by column1 having count(*)>1
The purpose of this query is to find the value in column1
that appears more than once. I can verify that this query works correctly and returns the column value that appears more than once.
I however do not understand why this query works. As per my understanding using group by
will remove duplicate fields. For instance if column1 had
column1
apple
mango
mango
Doing group by (column1)
will result
column1
apple
mango
At this point, if we perform having count(*)>1
or having count(column1)>1
, it should return no result because group by has already removed the duplicate field. But clearly, I am wrong as the above SQL statement does give the accurate result.
Would you please let me know the problem in my understanding?
Edit 1:
Besides the accepted answer, I this article which deals with order of SQL operation really helped my understanding
Upvotes: 4
Views: 490
Reputation: 1269773
You are misunderstanding how HAVING
works. In fact, you can think of it by using subqueries. Your query is equivalent to:
select column1
from (select column1, count(*) as cnt
from table1
group by column1
) as t
having cnt > 1;
That is, having
filters an aggregation query after the aggregation has taken place. However, the aggregation functions are applied per group. So count(*)
is counting the number of rows in each group. That is why it is identifying duplicates.
Upvotes: 2
Reputation: 1358
GROUP BY clause groups the selection per the fields you mention, on this case column1 but can be a combined column (e.g. column1, column2). By the way, I think if you run:
SELECT column1, Count(*) AS [Count], MIN(column2) AS MinColumn2, MAX(column2) AS MaxColumn2
FROM table1
GROUP BY column1;
Will help you to understand how GROUPING works. When filter by any column directly you may use the WHERE condition but if you want to filter per any field calculated from the grouping you need to use the HAVING clause.
Upvotes: 1
Reputation: 311308
group by
doesn't just remove duplicate values - it returns one row per distinct value of the group by clause, and allows you apply aggregate function per such unique value.
In this query, you actually query the values of column1
and the result of count(*)
per value of column1, then, you use the having
clause to return only the values of column1
that have a count(*)
greater than 1.
Upvotes: 2