Reputation: 1682
I have a transactions table structured like this:
+--------------------------------------+--------------------------+------------+
| contact_id | return_reason | date |
+--------------------------------------+--------------------------+------------+
| 2091c2ed-8f9b-bcfe-1884-50d3ab2cd02d | R01 - Insufficient Funds | 2014-01-25 |
| 2091c2ed-8f9b-bcfe-1884-50d3ab2cd02d | R08 - Payment Stopped | 2013-09-15 |
| 2091c2ed-8f9b-bcfe-1884-50d3ab2cd02d | R01 - Insufficient Funds | 2013-08-15 |
| 2091c2ed-8f9b-bcfe-1884-50d3ab2cd02d | R01 - Insufficient Funds | 2013-07-31 |
| 2091c2ed-8f9b-bcfe-1884-50d3ab2cd02d | R01 - Insufficient Funds | 2013-05-31 |
| 10101a4f-eaf8-b05a-4813-51a682df2189 | R08 - Payment Stopped | 2013-03-15 |
| 10101a4f-eaf8-b05a-4813-51a682df2189 | R08 - Payment Stopped | 2013-04-15 |
| 10101a4f-eaf8-b05a-4813-51a682df2189 | R08 - Payment Stopped | 2013-05-15 |
+--------------------------------------+--------------------------+------------+
The data I want to find is the most recent return_reason per contact_id.
My current query is this:
select contact_id,return_reason as most_recent_return_reason,max(date) as ram_date from transactions group by contact_id order by ram_date desc;
The result of my query is pulling the right return_reason but I'm not sure if it's doing it because it's accurate/correct of if it's because the odds are in my favor. I'm afraid because this very similar query pulls the wrong date value:
select contact_id,return_reason as most_recent_return_reason,date as ram_date from transactions group by contact_id order by ram_date desc;
Upvotes: 0
Views: 1531
Reputation: 576
In fact, the MAX operator in select is a formater of your result and not an operator applies on your result.
You can do something like that
SELECT contact_id,return_reason AS most_recent_return_reason, date AS ram_date
FROM transactions AS t
WHERE t.date = (
SELECT MAX(t2.date)
FROM transactions AS t2
WHERE t2.contact_id = t.contact_id
);
Additionally, you can see the diff on this SQL Fiddle: http://sqlfiddle.com/#!2/c3687/12
Upvotes: 1
Reputation: 1269445
As mentioned by Fabien, you are just being lucky, because you are using a MySQL extension to the group by
node. As documented here, columns in the select
that are not in the group by
and don't have aggregation functions are assigned values from arbitrary rows. That means that return_reason
comes from an arbitrary row, not the one with the max date.
Here is one pretty simple way to get the most recent reason:
select contact_id,
substring_index(group_concat(return_reason order by date desc), ',', 1
) as most_recent_return_reason,
max(date) as ram_date
from transactions
group by contact_id
order by ram_date desc;
Upvotes: 1
Reputation: 2741
You are right in saying that the "odds" are in your favor. This query wouldn't be valid in most other DBMS because "return_reason" is neither in the group by clause, nor an aggregate function, but MySQL is more permissive and lets you run it. The result is undefined though.
What you need is the aggregate function FIRST() and LAST(), which sadly don't exist in MySQL.
SOLUTION: You can do what you want to do by using GROUP_CONCAT(). Look into the documentation to see how it works: http://dev.mysql.com/doc/refman/5.0/en/group-by-functions.html#function_group-concat
Upvotes: 1