Steve
Steve

Reputation: 3095

MySQL query looking for 3 identical transactions over a very large group of transactions

We have a table of transactions that holds an amount, a user ID, a partner ID, and a description. When a user has 3 of a specific transaction they get a one-time bonus. The query we have worked OK on smaller sets, but now that we're at over 20,000,000 transaction rows it's just not cutting it. There must be a better way to approach this.

SELECT t.str_spid, count(*) AS C
  FROM transactions t
 WHERE t.str_prid = 150
   AND (t.str_desc = 'Annual Rewards' OR t.str_desc = 'Annual Rewards (PRO)')
   AND t.str_amount = 1000
   AND t.str_spid NOT IN (
        SELECT x.str_spid 
          FROM transactions x
         WHERE x.str_prid = 150
           AND x.str_amount = 2500
      GROUP BY x.str_spid
      )
GROUP BY t.STR_SPID 
HAVING C = 3

My code as a limit for 50 rows at a time but even that is choking down. There are apporximately 3 million people in this group which constitutes around 12,000,000 total transactions. The subquery alone is just shy of 1,000,000 rows.

One idea is to tag users who have received the bonus already so that they're skipped going forward. It would eliminate the need for the subquery, but I'd have to join over to the users table (t.str_serid = u.us_userid) to see if the flag was set or not.

Any tips, pointers, ideas etc. would be appreciated.

Upvotes: 0

Views: 29

Answers (2)

Uueerdo
Uueerdo

Reputation: 15961

If reviewing your indexes provides no solutions, this is one of the situations I would suggest trying out a correlated subquery. Changing this:

   AND t.str_spid NOT IN (
        SELECT x.str_spid 
          FROM transactions x
         WHERE x.str_prid = 150
           AND x.str_amount = 2500
      GROUP BY x.str_spid
      )

to this:

   AND NOT EXISTS (
        SELECT x.str_spid 
        FROM transactions x
        WHERE x.str_spid = t.str_spid
           AND x.str_prid = 150
           AND x.str_amount = 2500
      )

Also, while I doubt it makes a difference, DISTINCT is more appropriate for your original subquery than a GROUP BY with no aggregate functions.


Another option is a LEFT JOIN, which would actually be my more likely first choice:

SELECT t.str_spid, count(*) AS C
FROM transactions AS t
LEFT JOIN transactions AS x
   ON t.str_spid = x.str_spid
   AND x.str_prid = 150
   AND x.str_amount = 2500
WHERE t.str_prid = 150
   AND (t.str_desc = 'Annual Rewards' OR t.str_desc = 'Annual Rewards (PRO)')
   AND t.str_amount = 1000
   AND x.str_spid IS NULL
GROUP BY t.STR_SPID 
HAVING C = 3

Upvotes: 1

ScaisEdge
ScaisEdge

Reputation: 133400

be sure you have proper indexes

table transactions 

a composite index  on columns (str_prid,str_prid)

and

a second

composite index  on columns ( str_desc, str_amount,str_spid  )

Upvotes: 0

Related Questions