skyork
skyork

Reputation: 7391

Select the latest 3 records for each ID in a table

I have a table with a composite primary key (ID, Date) like below.

+------+------------+-------+
|  ID  |    Date    | Value |
+------+------------+-------+
|   1  | 1433419200 |   15  |
|   1  | 1433332800 |   23  |
|   1  | 1433246400 |   41  |
|   1  | 1433160000 |   55  |
|   1  | 1432900800 |   24  |
|   2  | 1433419200 |   52  |
|   2  | 1433332800 |   23  |
|   2  | 1433246400 |   39  |
|   2  | 1433160000 |   22  |
|   3  | 1433419200 |   11  |
|   3  | 1433246400 |   58  |
|  ... |    ...     |  ...  |
+------+------------+-------+

There is also a separate index on Date column. The table is of moderate size, currently ~600k row and growing by ~2k everyday.

I want to do a single SELECT query that returns the latest 3 records (ordered by Date timestamp) for each ID. For each given ID, the Date values are always unique, so no need to worry about ties for Date here.

I've tried a self-join approach, inspired by this answer, but it took quite a few seconds to run and returned nothing:

SELECT p1.ID, p1.Date, p1.Value FROM MyTable AS p1
LEFT JOIN MyTable AS p2 
ON p1.ID=p2.ID AND p1.Date<=p2.Date
GROUP BY p1.ID
HAVING COUNT(*)<=5
ORDER BY p1.ID, p1.Date DESC;

What would be a fast solution here?

Upvotes: 6

Views: 19565

Answers (3)

Menandro Ocampo
Menandro Ocampo

Reputation: 1

SELECT distinct x.ID,x.Date,X.Value
FROM ( SELECT DISTINCT ID FROM XXXTable  ) c
    CROSS APPLY (

    select top 3 A.ID,a.Date,Value,[Count] from (
    SELECT distinct ID,Date,Value, ROW_NUMBER()
    over (
        PARTITION BY ID
        order by Date
    ) AS [Count]  where c.ID = t.ID


    ) A  order by [Count] desc

Upvotes: 0

CL.
CL.

Reputation: 180070

You could look up the three most recent dates for each ID:

SELECT ID, Date, Value
FROM MyTable
WHERE Date IN (SELECT Date
               FROM MyTable AS T2
               WHERE T2.ID = MyTable.ID
               ORDER BY Date DESC
               LIMIT 3)

Alternatively, look up the third most recent date for each ID, and use it as a limit:

SELECT ID, Date, Value
FROM MyTable
WHERE Date >= IFNULL((SELECT Date
                      FROM MyTable AS T2
                      WHERE T2.ID = MyTable.ID
                      ORDER BY Date DESC
                      LIMIT 1 OFFSET 2),
                     0)

Both queries should get good performance from the primary key's index.

Upvotes: 14

Gordon Linoff
Gordon Linoff

Reputation: 1269873

First, here is the correct query for the inequality method:

SELECT p1.ID, p1.Date, p1.Value
FROM MyTable p1 LEFT JOIN
     MyTable AS p2 
     ON p1.ID = p2.ID AND p2.Date <= p1.Date
--------------------------^ fixed this condition
GROUP BY p1.ID, p1.Date, p1.Value
HAVING COUNT(*) <= 5
ORDER BY p1.ID, p1.Date DESC;

I'm not sure if there is a fast way to do this in SQLite. In most other databases, you can use the ANSI standard row_number() function. In MySQL, you can use variables. Both of these are difficult in SQLite. Your best solution may be to use a cursor.

The above can benefit from an index on MyTable(Id, Date).

Upvotes: 3

Related Questions