Dominic Rodger
Dominic Rodger

Reputation: 99841

sqlite query optimisation

I've got a sqlite table actions that looks something like this:

uuid varchar (36)
actiondate int
username varchar (16)
mood int
bonus int
status varchar (80)
... bunch of other similar fields (all short varchar or int fields)

This design seems to be sufficiently performant for most types of queries, but struggles a bit with a particular scenario, where I need to get some data about the latest action performed by each user as of a given date.

I was hoping to be able to do something like this:

SELECT status, actiondate
FROM actions WHERE actiondate < 20061231
GROUP BY username
ORDER BY actiondate DESC

However, the aggregation is not done with respect to the order clause, the order clause just determines what order the results are returned in, which makes sense.

So, I have this:

SELECT actiondate, status FROM actions
WHERE actiondate < 20061231 and
uuid = (SELECT uuid from actions as alt
        WHERE alt.username = actions.username
        ORDER BY actiondate DESC LIMIT 1)

Is there a better way of doing this sort of query? A better table layout? Currently this sort of query is taking ~400ms on my development box, and it'd be nice if I could shave 100ms or so off it (my target time is actually 100ms, but I'm skeptical as to whether that's manageable).

I've obviously got indexes on username and date (I've actually got several: one which is which seems to fit the slow query quite well; one on username; one on date ASC; one on date DESC and one on uuid).

FWIW, the action table's likely to have somewhere between 100 and 30,000 rows in it.

Upvotes: 1

Views: 914

Answers (2)

Cade Roux
Cade Roux

Reputation: 89741

Your index should cover all the columns used in the query for maximum performance.

I'm not sure about the performance of nested query in this case. I'd prefer to join to a subquery if the execution plan doesn't show that it's converting it to a good nested join.

For something like this, I might avoid the UUID if possible, and if not, I would ensure that it's increasing, so you could write:

SELECT actiondate
    ,status
FROM actions
INNER JOIN (
    SELECT username
        ,MAX(uuid) as last_uuid from actions
    WHERE actiondate < 20061231
    GROUP BY username
) AS last_occur
    ON last_occur.username = actions.username
    AND last_occur.last_uuid = actions.uuid
WHERE actiondate < 20061231

I would think this should perform well with an index on username ASC, uuid DESC, INCLUDE (actiondate) and and index on actiondate DESC, username ASC, INCLUDE (status), but obviously look at the query plan.

Without the increasing uuids, you will need some kind of rule to ensure you are selecting the latest action for a person, since unless username, actiondate is unique, there is nothing in your original ORDER BY actiondate DESC limit 1 to ensure you are picking the correct row each time. If username, actiondate is unique, then you can use the following:

SELECT actiondate
    ,status
FROM actions
INNER JOIN (
    SELECT username
        ,MAX(actiondate) as last_actiondate from actions
    WHERE actiondate < 20061231
    GROUP BY username
) AS last_occur
    ON last_occur.username = actions.username
    AND last_occur.last_actiondate = actions.actiondate
WHERE actiondate < 20061231

If it is not unique, it will still work, but you will get multiple actions for a person on their last actiondate. The recommended indexes would also be different in this case (and better), because the large uuid is not necessary.

Upvotes: 1

Alex Martelli
Alex Martelli

Reputation: 882851

Correctness before speed -- your query:

SELECT actiondate, status FROM actions
WHERE actiondate < 20061231 and
uuid = (SELECT uuid from actions as alt
        WHERE alt.username = actions.username
        ORDER BY actiondate DESC LIMIT 1)

doesn't perform the task you describe -- the inner select may return a uuid for an action that's later than 2061231, then the outer select will give no result for that username. I think you can fix it my moving the WHERE check on actiondate as an AND in the nested select. (I doubt this will speed things up, but at least it should make the behavior correct -- let us know about how, if at all, it affects the speed!).

Upvotes: 2

Related Questions