BenMorel
BenMorel

Reputation: 36484

LIMITing a SQL JOIN, with JOIN conditions

I have a problem similar to LIMITing a SQL JOIN, but with a slightly more complex requirement.

I want to search for Users and associated Transactions, which lie within a time range:

SELECT u.*, t.*
FROM User u
JOIN Transaction t ON t.user_id = u.id
WHERE t.timestamp >= ? and t.timestamp <= ?;

So far, so good. Now I want to repeat the query, but with a LIMIT on the number of users returned. There should be no limit on the number of transactions returned for a given user, though.

If I follow the approach suggested in the other question, this would translate into:

SELECT u.*, t.*
FROM (SELECT * FROM User LIMIT 10) u
JOIN Transaction t ON t.user_id = u.id
WHERE t.timestamp >= ? and t.timestamp <= ?;

This will not produce what I want: it will return the first 10 users, who might not have any transactions associated.

I want to return 10 users who have at least one associated transaction in the given time range.

How can I achieve this using MySQL?

Upvotes: 4

Views: 144

Answers (2)

Gordon Linoff
Gordon Linoff

Reputation: 1269633

You can do this without variables, but it requires repeating the join logic:

SELECT u.*, t.*
FROM (SELECT *
      FROM User
      WHERE EXISTS (SELECT 1
                    FROM Transaction t
                    WHERE t.user_id = u.id AND
                          t.timestamp >= ? and t.timestamp <= ?
                   )
      LIMIT 10
     ) u JOIN
     Transaction t
     ON t.user_id = u.id
WHERE t.timestamp >= ? and t.timestamp <= ?;

EDIT:

Probably the fastest answer is something like this:

select u.*, t.*
from (select user_id
      from (select user_id
            from transaction t
            where t.timestamp >= ? and t.timestamp <= ?
            limit 1000
           ) t
      limit 30
     ) tt join
     user u
     on tt.userid = u.id join
     transaction t
     on tt.userid = t.userid and t.timestamp >= ? and t.timestamp <= ?;

The first subquery chooses 1,000 matching records in the transaction table. My guess is that this is more than enough to get 30 users. This list is then joined to the user and transaction table to get the final results. By limiting the list without having to do a full table scan, the first query should be pretty fast . . . especially with an additional index on (timestamp, user).

Upvotes: 1

Giorgos Betsos
Giorgos Betsos

Reputation: 72165

You can use variables for this:

SELECT *
FROM (
  SELECT *,
         @rn := IF(@uid = user_id, @rn,
                   IF(@uid := user_id, @rn +1, @rn + 1)) AS rn 
  FROM (
     SELECT u.*, t.*
     FROM User u
     JOIN Transaction t ON t.user_id = u.id
     WHERE t.timestamp >= x and t.timestamp <= y) AS t
  CROSS JOIN (SELECT @rn := 0, @uid := 0) AS vars
  ORDER BY user_id) AS x
WHERE x.rn <= 10

Variable @rn is incremented by 1 every time a new user is returned by the query. So we can control the number of users returned using @rn <= 10.

Upvotes: 1

Related Questions