Reputation: 78922
I am storing the response to various rpc calls in a mysql table with the following fields:
Table: rpc_responses
timestamp (date)
method (varchar)
id (varchar)
response (mediumtext)
PRIMARY KEY(timestamp,method,id)
What is the best method of selecting the most recent responses for all existing combinations of method
and id
?
For each date there can only be one response for a given method/id.
Not all call combinations are necessarily present for a given date.
There are dozens of methods, thousands of ids and at least 365 different dates
Sample data:
timestamp method id response
2009-01-10 getThud 16 "....."
2009-01-10 getFoo 12 "....."
2009-01-10 getBar 12 "....."
2009-01-11 getFoo 12 "....."
2009-01-11 getBar 16 "....."
Desired result:
2009-01-10 getThud 16 "....."
2009-01-10 getBar 12 "....."
2009-01-11 getFoo 12 "....."
2009-01-11 getBar 16 "....."
(I don't think this is the same question - it won't give me the most recent response
)
Upvotes: 25
Views: 104861
Reputation: 9610
Checking the three main answers in some other use case shows that the most voted answer is also by far the fastest, swarm intelligence works here:
# Answer 1: https://stackoverflow.com/a/12625667/11154841
# 165ms
SELECT
COUNT(0)
FROM
(
SELECT
mtn.my_primary_key,
mtn.my_info_col,
IF(@last_my_primary_key = my_primary_key,
0,
1) AS new_my_primary_key_group,
@last_my_primary_key := my_primary_key
FROM
my_db_schema.my_table_name mtn
WHERE
mtn.date_time_col > now() - INTERVAL 1 MONTH
ORDER BY
my_primary_key,
mtn.date_time_col DESC
) AS t1
WHERE
new_my_primary_key_group = 1
AND t1.my_info_col = 'delete';
# Answer 2: https://stackoverflow.com/a/435709/11154841
# 757ms
SELECT
count(0)
FROM
my_db_schema.my_table_name mtn
JOIN
(
SELECT
my_primary_key,
max(date_time_col) AS date_time_col
FROM
my_db_schema.my_table_name mtn
WHERE
mtn.date_time_col > now() - INTERVAL 1 MONTH
GROUP BY
mtn.my_primary_key) latest
USING (my_primary_key,
date_time_col)
WHERE
mtn.my_info_col = 'delete';
# Answer 3: https://stackoverflow.com/a/3185644/11154841
# 1.310s
SELECT
count(0)
FROM
my_db_schema.my_table_name mtn
WHERE
mtn.date_time_col = (
SELECT
max(mtn2.date_time_col)
FROM
my_db_schema.my_table_name mtn2
WHERE
mtn2.my_primary_key = mtn.my_primary_key
AND mtn2.date_time_col > now() - INTERVAL 1 MONTH
)
AND mtn.date_time_col > now() - INTERVAL 1 MONTH
AND mtn.my_info_col = 'delete';
Upvotes: 0
Reputation: 78922
Self answered, but I'm not sure that it will be an efficient enough solution as the table grows:
SELECT timestamp,method,id,response FROM rpc_responses
INNER JOIN
(SELECT max(timestamp) as timestamp,method,id FROM rpc_responses GROUP BY method,id) latest
USING (timestamp,method,id);
Upvotes: 15
Reputation: 6516
This solution was updated recently.
Comments below may be outdated
This can query may perform well, because there are no joins.
SELECT * FROM (
SELECT *,if(@last_method=method,0,1) as new_method_group,@last_method:=method
FROM rpc_responses
ORDER BY method,timestamp DESC
) as t1
WHERE new_method_group=1;
Given that you want one resulting row per method
this solution should work, using mysql variables to avoid a JOIN.
FYI, PostgreSQL has a way of doing this built into the language:
SELECT DISTINCT ON (method) timestamp, method, id, response
FROM rpc_responses
WHERE 1 # some where clause here
ORDER BY method, timestamp DESC
Upvotes: 30
Reputation: 61
Try this...
SELECT o1.id, o1.timestamp, o1.method, o1.response
FROM rpc_responses o1
WHERE o1.timestamp = ( SELECT max(o2.timestamp)
FROM rpc_responses o2
WHERE o1.id = o2.id )
ORDER BY o1.timestamp, o1.method, o1.response
...it even works in Access!
Upvotes: 6
Reputation: 17
Subquery is very taxing when the data set becomes larger.
Try this:
SELECT t1.*
FROM rpc_responses AS t1
INNER JOIN rpc_responses AS t2
GROUP BY t1.method, t1.id, t1.timestamp
HAVING t1.timestamp=MAX(t2.timestamp)
ORDER BY t1.timestamp, t1.method, t1.response;
Upvotes: 0
Reputation: 307
i used this,worked for me
select max(timestamp),method,id from tables where 1 group by method,id order by timestamp desc
Upvotes: 0
Reputation: 2296
The concept of "most recent" is fairly vague. If you mean something like the 100 most recent rows then you can just add a TOP(100)
to your SELECT
clause.
If you mean the "most recent" based on a the most recent date then you can just do
SELECT timestamp,method,id,response
FROM rpc_responses
HAVING max(timestamp) = timestamp
Upvotes: -1
Reputation: 41
...is more than one year later but i might help someone To select all the queries starting from latest
SELECT *
FROM rpc_responses
ORDER BY timestamp DESC
Upvotes: -4