Reputation: 11
I'm just beginning SQL, forgive me for my very basic questions. I have two. Here is the relevant code:
SELECT column_1, column_2, COUNT(*) AS total
FROM table1
This simple query shows a single row for all identical instances. The COUNT column will show there are multiple instances, but it will only show up as one row. Even without COUNT, there is only one row. Why only one row if I didn't use DISTINCT? Without DISTINCT it would seem that all identical rows would show up individually.
What does the * do in COUNT. I understand it usually indicates "all", but what does it change in this case?
Thank you.
Upvotes: 1
Views: 93
Reputation: 350310
This query:
SELECT column_1, column_2, COUNT(*) AS total
FROM table1
violates a rule in SQL: when using aggregation functions (like count), any other expression in the select
clause should either:
group by
clauseThere are some nuances to this rule (like functional dependency), but this shows that the above SQL is ambiguous: count(*)
without group by
will ensure you get only one record in your output, but then it is not clear what the values would be of those other two expressions? Which record would they be based on? Some database engines have allowed these constructs, and choose a record to base those column values on.
However, if you remove all aggregations, you should get all the rows of your table:
SELECT column_1, column_2
FROM table1
If you want to get a row for each distinct value of column_1, column_2
, each with a count, then use group by
SELECT column_1, column_2, COUNT(*) AS total
FROM table1
GROUP BY column_1, column_2
The number of rows in the result will depend on how many distinct pairs of column1, column2
appear in your table.
As to the *
in COUNT(*)
: the alternative would be to mention an expression, like COUNT(column3)
: in that case null
values will not be counted. *
thus can be understood to mean: count all records, even when they have null
values.
Upvotes: 1
Reputation: 562378
https://dev.mysql.com/doc/refman/8.0/en/group-by-handling.html says:
Without GROUP BY, there is a single group and it is nondeterministic which name value to choose for the group.
That is, the use of any aggregating function such as COUNT() causes all the rows in the table to be treated as a single group. The result is one row for that group.
If it didn't work this way, there might not be any way to do an aggregation against the whole table.
You also asked about the purpose of the *
. By default, if you use COUNT(<expression>)
the rows where the expression is NULL are not counted. But COUNT(*)
is a special syntax that always counts all the rows, so you don't have to think up an expression that is guaranteed to be non-NULL.
Some people use a constant value that is not NULL, e.g. COUNT(1)
, which achieves the same result. But it makes some people wonder if COUNT(2)
would somehow count the rows differently (it doesn't). The standard SQL specification provides COUNT(*)
as a special syntax to make this more clear.
Upvotes: 1
Reputation: 1804
The * is just a short cut for all the columns.
If you select * from table1 it will display all the columns without requiring you to write them all out.
COUNT() is an aggregate function. If you pass COUNT(column_1) it would just count the rows with a column_1 value. So COUNT(*) is the same as writing COUNT(column_1, column_2, ...).
I'm thinking your SQL may work better with a group by clause at the end.
SELECT column_1, column_2, COUNT(*) AS total
FROM table1 GROUP BY column_1, column_2
Upvotes: -1