Austin
Austin

Reputation: 11

Basic SQL question: displaying single results of multiple rows

I'm just beginning SQL, forgive me for my very basic questions. I have two. Here is the relevant code:

SELECT column_1, column_2, COUNT(*) AS total
FROM table1
  1. This simple query shows a single row for all identical instances. The COUNT column will show there are multiple instances, but it will only show up as one row. Even without COUNT, there is only one row. Why only one row if I didn't use DISTINCT? Without DISTINCT it would seem that all identical rows would show up individually.

  2. What does the * do in COUNT. I understand it usually indicates "all", but what does it change in this case?

Thank you.

Upvotes: 1

Views: 93

Answers (3)

trincot
trincot

Reputation: 350310

This query:

SELECT column_1, column_2, COUNT(*) AS total
FROM table1

violates a rule in SQL: when using aggregation functions (like count), any other expression in the select clause should either:

  • also be an aggregation
  • should be grouped by in the group by clause

There are some nuances to this rule (like functional dependency), but this shows that the above SQL is ambiguous: count(*) without group by will ensure you get only one record in your output, but then it is not clear what the values would be of those other two expressions? Which record would they be based on? Some database engines have allowed these constructs, and choose a record to base those column values on.

However, if you remove all aggregations, you should get all the rows of your table:

SELECT column_1, column_2
FROM table1

If you want to get a row for each distinct value of column_1, column_2, each with a count, then use group by

SELECT column_1, column_2, COUNT(*) AS total
FROM table1
GROUP BY column_1, column_2

The number of rows in the result will depend on how many distinct pairs of column1, column2 appear in your table.

As to the * in COUNT(*): the alternative would be to mention an expression, like COUNT(column3): in that case null values will not be counted. * thus can be understood to mean: count all records, even when they have null values.

Upvotes: 1

Bill Karwin
Bill Karwin

Reputation: 562378

https://dev.mysql.com/doc/refman/8.0/en/group-by-handling.html says:

Without GROUP BY, there is a single group and it is nondeterministic which name value to choose for the group.

That is, the use of any aggregating function such as COUNT() causes all the rows in the table to be treated as a single group. The result is one row for that group.

If it didn't work this way, there might not be any way to do an aggregation against the whole table.

You also asked about the purpose of the *. By default, if you use COUNT(<expression>) the rows where the expression is NULL are not counted. But COUNT(*) is a special syntax that always counts all the rows, so you don't have to think up an expression that is guaranteed to be non-NULL.

Some people use a constant value that is not NULL, e.g. COUNT(1), which achieves the same result. But it makes some people wonder if COUNT(2) would somehow count the rows differently (it doesn't). The standard SQL specification provides COUNT(*) as a special syntax to make this more clear.

Upvotes: 1

Adrian White
Adrian White

Reputation: 1804

The * is just a short cut for all the columns.

If you select * from table1 it will display all the columns without requiring you to write them all out.

COUNT() is an aggregate function. If you pass COUNT(column_1) it would just count the rows with a column_1 value. So COUNT(*) is the same as writing COUNT(column_1, column_2, ...).

I'm thinking your SQL may work better with a group by clause at the end.

SELECT column_1, column_2, COUNT(*) AS total
FROM table1 GROUP BY column_1, column_2

Upvotes: -1

Related Questions