cghrmauritius
cghrmauritius

Reputation: 177

MySql select statement get most recent row for each client_id

Hi i have an issue with a mysql select statement i cant get my head around,

Table client_directory_data

id int, verified int, client_id int, created timestamp, description longtext

select * from client_directory_data where verified = 1 order by created desc

but this selects multiple rows for each client_id

what i need to do is to select every client_id which has a verified = 1 but only get the most recent row for each client_id, i hope that makes sense.

Upvotes: 2

Views: 1437

Answers (3)

Braiba
Braiba

Reputation: 571

This is an issue I face all the time. Fortunately there's a nice little trick for doing this:

SELECT
client_id,
SUBSTRING_INDEX(GROUP_CONCAT(id ORDER BY created DESC),",",1) AS `id`
FROM client_directory_data
WHERE verified = 1
GROUP BY client_id

And if you want the whole row you can just join onto it like so:

SELECT
*
FROM (
  SELECT
  client_id,
  SUBSTRING_INDEX(GROUP_CONCAT(id ORDER BY created DESC),",",1) AS `id`
  FROM client_directory_data
  WHERE verified = 1
  GROUP BY client_id
) ids
JOIN client_directory_data USING (id);

Of course if you're ordering by an indexed field anyway (that you could therefore join on efficiently anyway), it's better to use MAX(id) AS id, although it actually has very little impact on performance. The main reason to use MAX() is really to make the code a little simpler. It also avoids the pitfalls you may encounter if the field contains commas (which you can get around with a different seperator for the group concat) or hitting the max GROUP_CONCAT length (which can be extended with SET group_concat_max_len = xxx; and only causes warnings anyway).

I can see why this would intuitively seem like it would have performance issues, however it's actually the best performng method I've found for these queries - especially on large tables.

Here are some benchmarks I've taken from some of the larger tables currently available to me comparing the three methods in this thread.

Query A: (~5,000 records, ~900 results, non-indexed field)

  • GROUP_CONCAT method: 0.0100 seconds
  • MAX method: 0.102 seconds
  • LEFT JOIN method: 0.0082 seconds

Query B : (~300,000 records, ~95,000 results)

  • GROUP_CONCAT method: 1.8618 seconds
  • MAX method: 1.7904 seconds
  • LEFT JOIN method: 6.4649 seconds

Query C : (~300,000 records, ~7 results)

  • GROUP_CONCAT method: 0.103 seconds
  • MAX method: 0.0102 seconds
  • LEFT JOIN method: (I got bored after 4 hours)

Query D : (~500,000 records, ~5,000 different values of the field being grouped)

  • GROUP method: 0.1355 seconds
  • MAX Method : 0.0429 seconds
  • LEFT JOIN method: (I got bored after 10 minutes)

Upvotes: 4

Olivier Coilland
Olivier Coilland

Reputation: 3096

That makes sense and is a classic question.

Assuming that the most recent row is the one with highest id, you can use:

SELECT *
FROM client_directory_data c
LEFT JOIN client_directory_data d ON c.client_id = d.client_id AND d.verified = 1 AND d.id > c.id
WHERE d.id IS NULL
    AND c.verified = 1;

You can have an explanation of this query pattern here.

Upvotes: 3

user1437178
user1437178

Reputation:

Make id as primary key for the table client_directory_data

Upvotes: 0

Related Questions