Matthew Poer
Matthew Poer

Reputation: 1682

MySQL to use Group By to pull other varchar field from same row

I have a transactions table structured like this:

+--------------------------------------+--------------------------+------------+
| contact_id                           | return_reason            | date       |
+--------------------------------------+--------------------------+------------+
| 2091c2ed-8f9b-bcfe-1884-50d3ab2cd02d | R01 - Insufficient Funds | 2014-01-25 |
| 2091c2ed-8f9b-bcfe-1884-50d3ab2cd02d | R08 - Payment Stopped    | 2013-09-15 |
| 2091c2ed-8f9b-bcfe-1884-50d3ab2cd02d | R01 - Insufficient Funds | 2013-08-15 |
| 2091c2ed-8f9b-bcfe-1884-50d3ab2cd02d | R01 - Insufficient Funds | 2013-07-31 |
| 2091c2ed-8f9b-bcfe-1884-50d3ab2cd02d | R01 - Insufficient Funds | 2013-05-31 |
| 10101a4f-eaf8-b05a-4813-51a682df2189 | R08 - Payment Stopped    | 2013-03-15 |
| 10101a4f-eaf8-b05a-4813-51a682df2189 | R08 - Payment Stopped    | 2013-04-15 |
| 10101a4f-eaf8-b05a-4813-51a682df2189 | R08 - Payment Stopped    | 2013-05-15 |
+--------------------------------------+--------------------------+------------+

The data I want to find is the most recent return_reason per contact_id.

My current query is this:

select contact_id,return_reason as most_recent_return_reason,max(date) as ram_date from transactions group by contact_id order by ram_date desc;

The result of my query is pulling the right return_reason but I'm not sure if it's doing it because it's accurate/correct of if it's because the odds are in my favor. I'm afraid because this very similar query pulls the wrong date value:

select contact_id,return_reason as most_recent_return_reason,date as ram_date from transactions group by contact_id order by ram_date desc;

Upvotes: 0

Views: 1531

Answers (3)

Cédric Mourizard
Cédric Mourizard

Reputation: 576

In fact, the MAX operator in select is a formater of your result and not an operator applies on your result.

You can do something like that

SELECT contact_id,return_reason AS most_recent_return_reason, date AS ram_date
FROM transactions AS t
WHERE t.date = (
  SELECT MAX(t2.date)
  FROM transactions AS t2
  WHERE t2.contact_id = t.contact_id
);

Additionally, you can see the diff on this SQL Fiddle: http://sqlfiddle.com/#!2/c3687/12

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1269445

As mentioned by Fabien, you are just being lucky, because you are using a MySQL extension to the group by node. As documented here, columns in the select that are not in the group by and don't have aggregation functions are assigned values from arbitrary rows. That means that return_reason comes from an arbitrary row, not the one with the max date.

Here is one pretty simple way to get the most recent reason:

select contact_id,
       substring_index(group_concat(return_reason order by date desc), ',', 1
                      ) as most_recent_return_reason,
       max(date) as ram_date
from transactions
group by contact_id
order by ram_date desc;

Upvotes: 1

Fabien Warniez
Fabien Warniez

Reputation: 2741

You are right in saying that the "odds" are in your favor. This query wouldn't be valid in most other DBMS because "return_reason" is neither in the group by clause, nor an aggregate function, but MySQL is more permissive and lets you run it. The result is undefined though.

What you need is the aggregate function FIRST() and LAST(), which sadly don't exist in MySQL.

SOLUTION: You can do what you want to do by using GROUP_CONCAT(). Look into the documentation to see how it works: http://dev.mysql.com/doc/refman/5.0/en/group-by-functions.html#function_group-concat

Upvotes: 1

Related Questions