Colin Hicks
Colin Hicks

Reputation: 360

Aggregating on a column that is also being grouped on

I know there's a lot of confusion related to grouping/aggregation etc, and I thought that I had a pretty decent grasp on the whole thing until I saw something along the lines of

SELECT A, SUM(B)
FROM T
GROUP BY A
HAVING COUNT(A)>1;

At first this puzzled me since it seemed performing an aggregate on a column that is also being grouped on is redundant, since by definition the value for the group will be distinct. But then I thought about it and it kind of makes sense for duplicate values in the table, if the aggregation was done before the grouping. In my head, it seems like its treating it more like this kind of query

SELECT A, SUM(B)
FROM T
WHERE A in (SELECT A FROM T GROUP BY A HAVING COUNT(*)>1)
GROUP BY A;

As opposed to another selection operator on each group after the grouping is done (since to me that doesn't make much sense).

So my question is multifold: Can elements being grouped on be included in the HAVING clause at all? Can elements being grouped on be aggregated on (in the HAVING clause or elsewhere like SELECT clause)? If the previous statements hold, is my understanding of what this operation means correct?

NOTE: This question is mainly about standard (ansi) SQL but info on particular implementations would also be interesting

Upvotes: 0

Views: 46

Answers (1)

Gordon Linoff
Gordon Linoff

Reputation: 1270713

The arguments to an aggregation function can include the keys being aggregated.

That said, the more common way to count rows in each group is to use COUNT(*). I would recommend:

SELECT A, SUM(B)
FROM T
GROUP BY A
HAVING COUNT(*) > 1;

There is a slight overhead to using COUNT(A) because the value of A needs to be checked against NULL in each row.

Upvotes: 1

Related Questions