Reputation: 18657
I have a query that is not grouping properly and returning the wrong results and I can't figure out what the problem is.
The query is shown below. FYI - It's not obvious in it's current form why I need the group by because I've removed all other parts of the query to get to the most basic form where I see a problem.
SELECT * FROM (
SELECT *
FROM notifications n
WHERE 1
-- and group_id = '5b35c8eb075881f8bbdfbcb36b052aa7'
GROUP BY `from`
) t
WHERE group_id = '5b35c8eb075881f8bbdfbcb36b052aa7'
The problem is that when I use put the where on the inside subquery (currently commented out), for this case, I end up with 4 results. Each of the 4 results have a different "from" value so should be listed separately. When I put the where on the outside of the subquery I end up with 3 results.
For completeness the table definition is:
CREATE TABLE `notifications` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`mem_id` int(10) unsigned DEFAULT NULL,
`type` varchar(255) NOT NULL,
`from` varchar(255) DEFAULT NULL,
`entry_id` int(11) DEFAULT NULL,
`parent_id` int(11) DEFAULT NULL,
`table_id` varchar(255) DEFAULT NULL,
`created_at` datetime DEFAULT NULL,
`emailed` tinyint(1) DEFAULT NULL,
`read` tinyint(1) NOT NULL,
`group_id` char(32) NOT NULL,
PRIMARY KEY (`id`),
KEY `mem_id` (`mem_id`),
KEY `created_at` (`created_at`),
KEY `entry_id` (`entry_id`),
KEY `parent_id` (`parent_id`),
KEY `group_id` (`group_id`)
)
Any ideas what could cause this? I'm completely stumped. At this point I'm ready to attribute it to some bug in mysql but that also seems unlikely.
I wasn't clear by what I meant by "wrong results" There were 7 records in the data set with this group_id. There were 2 records with a unique "from" and 5 more records with 2 other "from" ids (one had 3 records, one had 2).
Doing the where for the group by on the inside resulted in in the 4 records that I wanted. I don't care about which row was selected as the result because I'm doing other sums/counts which I excluded from the example because it wasn't directly relevant to the problem.
If I do the where on the outer group by one of the two records with a single "from" did not return at all.
I'll try to update with a sqlfiddle (didn't know about that!) - the issue is that this database I was testing on is wiped daily so I don't have the original data, I'll see if I can reproduce.
I noticed that in my questions, I've been referring to inner and outer group by - the group by is always on the inner query it's just where the "where" is. I've tried to adjust the phrasing. Again, it's not immediately obvious why I care about the location of the where - but in my final use case, I need the selection to happen on the outside (I'm building a count of notifications that are read/unread and I need a count both per member and total per message - eg the group_id)
sqlfiddle: http://www.sqlfiddle.com/#!2/7d746/5
screenshot of query with inner where:https://www.evernote.com/shard/s48/sh/e355e96e-e48d-4550-bbaf-ffb18bc0bb9c/08e2454867e00e3a05535303429748f1
screenshot of query with outer where:https://www.evernote.com/shard/s48/sh/60b10427-e417-4196-8b92-7d6d8031d21e/c779bc9c46d23472983ac6fa0d25e42d
With the sqlfiddle I get back 4 results each time! Which leads me more to think it's a server issue. We're running MySQL 5.5.28-29.2 Percona Server (GPL), Release rel29.2, Revision 360
Upvotes: 0
Views: 16704
Reputation: 36127
This query:
SELECT *
FROM notifications n
WHERE 1
GROUP BY `from`
is simply wrong in ANSI SQL and on almost all DBMS (oracle, postgres, MS SQL etc.).
It runs on MySql only because of their nonstandard group by extension
See this link: http://dev.mysql.com/doc/refman/5.0/en/group-by-extensions.html
Hovever they warn about something:
However, this is useful primarily when all values in each nonaggregated column not named in the GROUP BY are the same for each group. The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate.
Because of this "feature" your query (select from select * group by) is unpredicable, results are dependent on the order of records in the table.
Take a look at this simple demo: http://www.sqlfiddle.com/#!2/b762e/2
There are two identic tables in this demo, with the same content, the only difference is a physical rows order. And the same queries give completely different results.
---- EDIT how to solve this problem -----
To solve this problem in your query, just add both columns to the GROUP BY clause.
select * FROM (
SELECT * FROM notifications n
GROUP BY `from`, `group_id`
) x
WHERE group_id = 'A';
select * FROM (
SELECT * FROM notifications n
WHERE group_id = 'A'
GROUP BY `from`, `group_id`
) x
Above two queries give always the same results for columns from
and group_id
, other columns (not included in the GROUP BY clause`) can be random.
Take a look at simple demo --> http://www.sqlfiddle.com/#!2/5d19b/5
Upvotes: 2