Xiang
Xiang

Reputation: 314

Why columns in selection without aggregate function needs to be part of Group by clause in MySQL?

Here is the link to the table and detailed description of question 11 on SQLZOO Join https://sqlzoo.net/wiki/The_JOIN_operation.

The question is asking 'For every match involving 'POL', show the matchid, date and the number of goals scored.'

Below is my SQL code

SELECT matchid, mdate, COUNT(player)
FROM game JOIN goal ON id = matchid
WHERE (team1 = 'POL' OR team2 = 'POL')
GROUP BY matchid

The error message is 'gisq.game.mdate' isn't in GROUP BY. It turned out in the last line of my code I have to do GROUP BY matchid, mdate.

I am learning SQL now, can anyone help me understand why here I have to group by both matchid and mdate instead of just one of them?

Thank you in advance for your kind help.

Upvotes: 0

Views: 600

Answers (1)

Somy
Somy

Reputation: 1624

When GROUP BY is present, or any aggregate functions are present, it is not valid for the SELECT list expressions to refer to ungrouped columns except within aggregate functions (like sum, max, min etc which would return single value for each group), since there would otherwise be more than one possible value to return for an ungrouped column and select won't just return you an arbitrary value.

However, there are multiple workarounds to this.

Option 1. Which you did yourself, adding the other column in group by as -

    SELECT 
          matchid
        , mdate
        , COUNT(player)
    FROM game 
      JOIN goal 
         ON id = matchid
   WHERE (team1= 'POL' OR team2= 'POL')
   GROUP BY matchid, mdate;

Option 2. Also, what you could do in this instance is to add aggregate function on the other column as below (since the field mdate is functionally dependent on match id hence you can do that. You can use any aggregate function which would pick a value)

    SELECT
          matchid
        , max(mdate) as mdate
        , COUNT(player)
    FROM game 
      JOIN goal 
         ON id = matchid
   WHERE (team1= 'POL' OR team2= 'POL')
   GROUP BY matchid;

Option 3. You can calculate the aggregate in a sub-query and then join it with itself to get the additional columns you'd need to show as below

    select 
          t1.matchid
        , t2.mdate
        , t1.count_player
    from
    (SELECT 
          matchid
        , COUNT(player) as count_player
    FROM game 
      JOIN goal 
         ON id = matchid
   WHERE (team1= 'POL' OR team2= 'POL')
   GROUP BY matchid) t1 
   join game t2 on t1.matchid = t2.id;

Option 4. You can also use window function and get the distinct tuple value

    SELECT distinct 
                     matchid
                   , mdate
                   , COUNT(player) over(partition by matchid) as 
    count_player
    FROM game 
      JOIN goal 
         ON id = matchid
   WHERE (team1= 'POL' OR team2= 'POL');

Upvotes: 1

Related Questions