davidjhp
davidjhp

Reputation: 8016

MySQL how to write SQL to find excessive transactions in 15 minute windows?

MySQL
Lets say there is a credit card processing company. Every time a credit card is used a row gets inserted into a table.

create table tran(
  id int,
  tran_dt datetime, 
  card_id int,
  merchant_id int,
  amount int
);

One wants to know what cards have been used 3+ times in any 15 minute window at the same merchant.

My attempt:

select card_id, date(tran_dt), hour(tran_dt), merchant_id, count(*)
from tran
group by card_id, date(tran_dt), hour(tran_dt), merchant_id
having count(*)>=3

The first problem is that would give excessive transactions per hour, not per a 15 minute window. The second problem is that would not catch transactions that cross the hour mark ie at 1:59pm and 2:01pm.

To make this simpler, it would ok to split up the hour into 5 minute increments. So we would not have to check 1:00-1:15pm, 1:01-1:16pm, etc. It would be ok to check 1:00-1:15pm, 1:05-1:20pm, etc., if that is easier.

Any ideas how to fix the sql? I have a feeling maybe I need sql window functions, that are not yet available in MySQL. Or write a stored procedure that can look at each 15 block.

http://sqlfiddle.com/#!9/f2d74/1

Upvotes: 12

Views: 1651

Answers (4)

Marcos Dimitrio
Marcos Dimitrio

Reputation: 6852

I was able to group all rows belonging to the same 15 minute window without duplicate records in the result, using in a single query.

Say your table has:

| id | tran_dt             | card_id | merchant_id | amount |
|----|---------------------|---------|-------------|--------|
| 13 | 2015-07-23 16:40:00 |       1 |           1 |     10 |
| 14 | 2015-07-23 16:59:00 |       1 |           1 |     10 |  <-- these should
| 15 | 2015-07-23 17:00:00 |       1 |           1 |     10 |  <-- be identified
| 16 | 2015-07-23 17:01:00 |       1 |           1 |     10 |  <-- in the
| 17 | 2015-07-23 17:02:00 |       1 |           1 |     10 |  <-- first group
| 18 | 2015-07-23 17:03:00 |       2 |           2 |     10 |
...
| 50 | 2015-07-24 17:58:00 |       1 |           1 |     10 |  <-- and these
| 51 | 2015-07-24 17:59:00 |       1 |           1 |     10 |  <-- in the
| 52 | 2015-07-24 18:00:00 |       1 |           1 |     10 |  <-- second

The result will be:

| id | card_id | merchant_id | numTrans | amount | dateTimeFirstTrans  | dateTimeLastTrans   
|----|---------|-------------|----------|--------|---------------------|---------------------
| 14 |       1 |           1 |        4 |     40 | 2015-07-23 16:59:00 | 2015-07-23 17:02:00 
| 50 |       1 |           1 |        3 |     30 | 2015-07-24 17:58:00 | 2015-07-24 18:00:00 

The query (SQL Fiddle):

select output.* from
(
    select 
        min(subquery.main_id) as id, 
        subquery.main_card_id as card_id, 
        subquery.main_merchant_id as merchant_id, 
        count(subquery.main_id) as numTrans, 
        sum(subquery.main_amount) as amount,
        min(subquery.x_timeFrameStart) as dateTimeFirstTrans, 
        max(subquery.x_timeFrameStart) as dateTimeLastTrans
    from 
    (
        select
            main.id as main_id, 
            main.card_id as main_card_id, 
            main.merchant_id as main_merchant_id, 
            main.tran_dt as main_timeFrameStart, 
            main.amount as main_amount,
            main.tran_dt + INTERVAL 15 MINUTE as main_timeFrameEnd,
            xList.tran_dt as x_timeFrameStart,
            xList.tran_dt + INTERVAL 15 MINUTE as x_timeFrameEnd
        from tran as main
            inner join tran as xList on  /* cross list */
                main.card_id = xList.card_id and
                main.merchant_id = xList.merchant_id
        where
            xList.tran_dt between main.tran_dt and main.tran_dt + INTERVAL 15 MINUTE
    ) as subquery
    group by subquery.main_id, subquery.main_card_id, subquery.main_merchant_id, subquery.main_timeFrameStart, subquery.main_timeFrameEnd
    having count(subquery.main_id) >= 3 
) as output
    left join (
        select
            xList.id as x_id
        from tran as main
            inner join tran as xList on  /* cross list */
                main.card_id = xList.card_id and
                main.merchant_id = xList.merchant_id and
                main.id <> xList.id  /* keep only first of the list */
        where
            xList.tran_dt between main.tran_dt and main.tran_dt + INTERVAL 15 MINUTE
    ) as exclude on output.id = exclude.x_id
where exclude.x_id is null;

The query is a bit long, and it repeats one subquery just to filter duplicates, so do your testing and tuning to make sure you don't incur in performance problems.

Upvotes: 0

Drew
Drew

Reputation: 24959

SELECT t1.card_id,t1.merchant_id,count(distinct t1.id)+1 as ChargeCount
FROM tran t1 
INNER JOIN tran t2
on t2.card_id=t1.card_id 
and t2.merchant_id=t1.merchant_id 
and t2.tran_dt <= DATE_ADD(t1.tran_dt, INTERVAL 15 MINUTE)
and t2.id>t1.id
GROUP BY t1.card_id,t1.merchant_id
HAVING ChargeCount>2;

Upvotes: 1

Justin Hanley
Justin Hanley

Reputation: 453

An option might be adding a trigger to the tran table on insert that checks the card_id inserted against the previous 15 minutes. If the count is greater than 3 then insert it into an "audit" table that you can query at your leisure.

-- create table to store audited cards    
create table audit_cards(
      card_id int,
      tran_dt datetime
    );

-- create trigger on tran table to catch the cards used 3 times in 15 min
CREATE TRIGGER audit_card AFTER INSERT ON tran
BEGIN
   if (select count(new.card_id) 
          from tran 
             where tran_dt >= (new.tran_dt - INTERVAL 15 MINUTE)) >= 3
   THEN
       INSERT new.card_id, new.tran_dt into audit_cards;
END;

Then you can run a report on these cards...

select * from audit_cards;

http://dev.mysql.com/doc/refman/5.6/en/trigger-syntax.html

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1270091

You can convert the date/time to seconds and do arithmetic on the seconds to get the value within a 15 minute clock interval:

select card_id, min(date(tran_dt)) as first_charge_time, merchant_id, count(*)
from tran
group by card_id, floor(to_seconds(tran_dt) / (60 * 15)), merchant_id
having count(*) >= 3;

The above uses to_seconds(). In earlier versions of MySQL, you can use unix_timestamp().

Getting any 15 minute interval is more challenging. You can express the query as:

select t1.*, count(*) as numTransactions
from tran t1 join
     tran t2
     on t1.merchant_id = t2.merchanti_d and
        t1.card_id = t2.card_id and
        t2.tran_dt >= t1.tran_dt and
        t2.tran_dt < t1.tran_dt + interval 15 minute 
group by t1.id
having numTransactions >= 3;

Performance of this query might be problematic. An index on trans(card_id, merchant_id, tran_dt) should help a lot.

Upvotes: 2

Related Questions