Joe
Joe

Reputation: 1958

Weird behavior with MySQL GroupBy on multiple columns

I am writing a query that groups a table of users by (height, sex) and then gets the name of the user with the minimum weight for that combination of height and sex.

If you wanna skip to the actual query, here is the SQLFiddle with a minimal representative example.

If you don't want to open that url, here is what I am talking about. Given a table such as:

| height | sex    | name    | weight |
|-------:|--------|---------|--------|
| 100    | female | Alice   | 150    |
| 100    | female | Barbara | 130    |
| 100    | female | Candice | 100    |

and a query that does this as follows:

SELECT name, min(weight) from users
group by height, sex

Why does the query output:

|  name | min(weight) |
|------:|-------------|
| Alice | 100         |

What I actually want is Candice, 100 not Alice, 100.

I figured out it chooses Alice because it is the first row, but why does it do that? It's really weird and unexpected.

Upvotes: 0

Views: 123

Answers (3)

GMB
GMB

Reputation: 222482

You are not using GROUP BY correctly. It is a general rule in most RDBMS that all non-aggregated columns must appear in the GROUP BY clause. Only ancient versions of MySQL allow breaking that rule (in newer versions, you need to disable option ONLY_FULL_GROUP_BY), but that can lead to unpredictable results, as you are experiencing here.

To exhibit the user that has the smallest weight within groups of recrods sharing the same height and sex, one solution would be to use a NOT EXISTS condition with a correlated subquery, like:

SELECT *
FROM users u
WHERE NOT EXISTS (
    SELECT 1
    FROM users u1
    WHERE 
        u1.height = u.height 
        AND u1.sex = u.sex
        AND u1.weight < u.weight
)

Rationale: the NOT EXITS condition combined with the subquery eliminates records for which another record exists with the same height and sex but with a smaller weight, leaving in the result set only the record that has the smallest weight for each height/sex tuple. We use SELECT 1 because we do not need the subquery to return an actual result, we just want to know if it returns something.

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1269873

I would use a correlated subquery, but not with fancy logic:

select u.*
from users u
where u.weight = (select min(u2.weight)
                  from users u2
                  where u2.height = u.height and u2.sex = u.sex
                 );

That is, return the users whose weight is the minimum for the height/sex combination.

Upvotes: 1

nbk
nbk

Reputation: 49375

With these data

CREATE TABLE users
(`height` varchar(8), `sex` varchar(8), `name` varchar(9), `weight` varchar(8))
;

INSERT INTO users
(`height`, `sex`, `name`, `weight`)
VALUES
('100', 'female', 'Alice', '150'),
('100', 'female', 'Barbara', '130'),
('100', 'female', 'Candice', '100'),
('190', 'male', 'John', '185'),
('190', 'male', 'Bert', '130'),
('190', 'male', 'John', '113')
;

And this sql statement

Select name, u.weight
from users u inner join 
(Select min(weight) weight ,sex,height
From users
Group by height, sex ) u1 
On u.sex= u1.sex and u.height = u1.height
and u.weight = u1.weight
GROUP By u.height,u.sex;

You get following Result

name        weight
Candice     100
John        113

The second select statement u1 get u alle the minimum weghts for all groups sex and height, the rest is a simple inner join, where only the correlating names are chosen

Upvotes: 0

Related Questions