sharf
sharf

Reputation: 2143

MySQL JOIN on MAX

I have two My SQL Tables I am trying to join, they are simplified as:

+----------------------------+
| customers                  |
+-------------+-------+------+
| customer_id | first | last |
+-------------+-------+------+
| 0           | John  | Doe  |
+-------------+-------+------+
| 1           | Jane  | Doe  |
+-------------+-------+------+

+-------------------------------------------------------------------+
| contact_log                                                       |
+----------------+-------------+--------------+---------------------+
| contact_log_id | customer_id | contact_type | date_time           |
+----------------+-------------+--------------+---------------------+
| 0              | 0           | email        | 2016-05-17 03:21:45 |
+----------------+-------------+--------------+---------------------+
| 1              | 0           | phone        | 2016-05-17 16:11:35 |
+----------------+-------------+--------------+---------------------+
| ...            | ...         | ...          |                     |
+----------------+-------------+--------------+---------------------+

I need a query that will select the customers, and their most recent contact time and type. I've tried this query:

SELECT
    `customers`.`customer_id`,
    `customers`.`first`,
    `customers.last`,
    `contact_log`.`contact_type`,
    MAX(`contact_log`.`date_time`)
FROM
    `customers`
JOIN
    `contact_log`
ON
    `customers`.`customer_id` = `contact_log`.`customer_id`

This usually sorts the date_time incorrectly. Upon researching the issue, there is a bug in some MySQL versions where MAX and MIN don't work correctly with DATETIME. So the workaround is

MAX(CAST(`contact_log`.`date_time` AS CHAR))

So then I get the customer rows, with the latest date_time. However, the contact_type does not match the time. In the example data, my result looks like:

+-------------+-------+------+--------------+---------------------+
| customer_id | first | last | contact_type | date_time           |
+-------------+-------+------+--------------+---------------------+
| 0           | John  | Doe  | email        | 2016-05-17 16:11:35 |
+-------------+-------+------+--------------+---------------------+

The contact_type does not match that of the date_time from the contact_log table. I suspect this has to do with the order the SELECT/JOIN is happening, and when it is filtered. I have to be careful of sub queries (to avoid n+1) because these are very large tables and will be selecting potentially hundreds of rows from both tables.

What is the proper query to get the contact_type and date_time to match?

Update When I originally asked this question I did not realize you can't have sub-queries in a view. This needs to be saved as a view. For completeness of this question, how would this be broken into multiple views and joined into one?

Upvotes: 4

Views: 69

Answers (3)

Blank
Blank

Reputation: 12378

Without subquery, here is a solution used having clause:

select c.*, cl.contact_type, cl.date_time
from customers c
join contact_log cl
on c.customer_id = cl.customer_id
left join contact_log t
on cl.customer_id = t.customer_id
and cl.date_time <= t.date_time
group by c.customer_id, c.`first`, c.`last`, cl.contact_type, cl.date_time
having count(*) <= 1

Demo Here

Upvotes: 0

berty
berty

Reputation: 2206

Without views

A simple solution is to use a subquery to get the contact logs ordered by date, called by a global query to group them by customer_id :

SELECT * FROM
(
    SELECT
        customers.customer_id,
        customers.first,
        customers.last,
        contact_log.contact_type,
        contact_log.date_time

        FROM customers
            INNER JOIN contact_log ON contact_log.customer_id = customers.customer_id -- or LEFT JOIN - see comment

        ORDER BY contact_log.date_time DESC
) logs GROUP BY logs.customer_id

If you have a huge database, you'll have to check if the schema is correctly indexed, cache enabled, etc...

With views

The logic is the same. The subquery is replaced by a first views, which is requested by a "global" view to group results. Note that I used GROUP BY and not ORDER BY in the "logs" view.

CREATE VIEW logs AS 
    SELECT
        customers.customer_id,
        customers.first,
        customers.last,
        contact_log.contact_type,
        contact_log.date_time

        FROM customers
            LEFT JOIN contact_log ON contact_log.customer_id = customers.customer_id

        GROUP BY
            customers.customer_id,
            contact_log.date_time DESC,
            contact_log.contact_type DESC;

CREATE VIEW testview AS SELECT * FROM logs GROUP BY logs.customer_id;

SELECT * FROM testview;

Upvotes: 2

Nir Levy
Nir Levy

Reputation: 12953

your problem is that you are using MAX without GROUP BY, so you are getting the max data over all records and not per user.
I'd use an inner query to get only the max dates, and join on it:

SELECT
    customers.customer_id,
    customers.first,
    customers.last,
    max_contact_log.contact_type,
    max_contact_log.date_time
FROM
    customers
JOIN
    (select customer_id, contact_type, max(date_time) AS date_time
      FROM contact_log GROUP BY customer_id
    ) as max_contact_log 
ON
    customers.customer_id = max_contact_log.customer_id;

Upvotes: 0

Related Questions