Kartik
Kartik

Reputation: 7917

SQL query to select all rows with max column value

CREATE TABLE `user_activity` (
  `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  `user_id` int(11) DEFAULT NULL,
  `type` enum('request','response') DEFAULT NULL,
  `data` longtext NOT NULL,
  `created_at` datetime DEFAULT NULL,
  `source` varchar(255) DEFAULT NULL,
  `task_name` varchar(255) DEFAULT NULL,
  PRIMARY KEY (`id`)
);

I have this data:-

enter image description here

Now I need to select all rows for user_id=527 where created_at value is the maximum. So I need the last 3 rows in this image.

I wrote this query:-

SELECT * 
FROM   user_activity 
WHERE  user_id = 527 
       AND source = 'E1' 
       AND task_name IN ( 'GetReportTask', 'StopMonitoringUserTask' ) 
       AND created_at = (SELECT Max(created_at) 
                         FROM   user_activity 
                         WHERE  user_id = 527 
                                AND source = 'E1' 
                                AND task_name IN ( 'GetReportTask', 
                                                   'StopMonitoringUserTask' ));

This is very inefficient because I am running the exact same query again as an inner query except that it disregards created_at. What's the right way to do this?

Upvotes: 1

Views: 477

Answers (3)

Kartik
Kartik

Reputation: 7917

I used EverSQL and applied my own changes to come up with this single-select query that uses self-join:-

SELECT * 
FROM   user_activity AS ua1 
       LEFT JOIN user_activity AS ua2 
              ON ua2.user_id = ua1.user_id 
                 AND ua2.source = ua1.source 
                 AND ua2.task_name IN ( 'GetReportTask', 'StopMonitoringUserTask' ) 
                 AND ua1.created_at < ua2.created_at 
WHERE  ua1.user_id = 527 
       AND ua1.source = 'E1' 
       AND ua1.task_name IN ( 'GetReportTask', 'StopMonitoringUserTask' ) 
       AND ua2.created_at IS NULL;

However, I noticed that the response times of both queries were similar. I tried to use Explain to identify any performance differences; and from what I understood from its output, there are no noticeable differences because proper indexing is in place. So for readability and maintainability, I'll just use the nested query.

Upvotes: 0

Gordon Linoff
Gordon Linoff

Reputation: 1270853

I would use a correlated subquery:

SELECT ua.* 
FROM user_activity ua
WHERE ua.user_id = 527 AND source = 'E1' AND
      ua.task_name IN ('GetReportTask', 'StopMonitoringUserTask' ) AND
      ua.created_at = (SELECT MAX(ua2.created_at) 
                       FROM user_activity ua2
                       WHERE ua2.user_id = ua.user_id AND 
                             ua2.source = ua.source AND
                             ua2.task_name IN ( 'GetReportTask', 'StopMonitoringUserTask' )
                      );

Although this might seem inefficient, you can create an index on user_activity(user_id, source, task_name, created_at). With this index, the query should have decent performance.

Upvotes: 1

kurt
kurt

Reputation: 1156

Order by created_at desc and limit your query to return 1 row.

SELECT * 
FROM   user_activity 
WHERE  user_id = 527 
   AND source = 'E1' 
   AND task_name IN ( 'GetReportTask', 'StopMonitoringUserTask' ) 
ORDER BY created_at DESC
LIMIT 1;

Upvotes: 0

Related Questions