Reputation: 8016
MySQL
Lets say there is a credit card processing company. Every time a credit card is used a row gets inserted into a table.
create table tran(
id int,
tran_dt datetime,
card_id int,
merchant_id int,
amount int
);
One wants to know what cards have been used 3+ times in any 15 minute window at the same merchant.
My attempt:
select card_id, date(tran_dt), hour(tran_dt), merchant_id, count(*)
from tran
group by card_id, date(tran_dt), hour(tran_dt), merchant_id
having count(*)>=3
The first problem is that would give excessive transactions per hour, not per a 15 minute window. The second problem is that would not catch transactions that cross the hour mark ie at 1:59pm and 2:01pm.
To make this simpler, it would ok to split up the hour into 5 minute increments. So we would not have to check 1:00-1:15pm, 1:01-1:16pm, etc. It would be ok to check 1:00-1:15pm, 1:05-1:20pm, etc., if that is easier.
Any ideas how to fix the sql? I have a feeling maybe I need sql window functions, that are not yet available in MySQL. Or write a stored procedure that can look at each 15 block.
http://sqlfiddle.com/#!9/f2d74/1
Upvotes: 12
Views: 1651
Reputation: 6852
I was able to group all rows belonging to the same 15 minute window without duplicate records in the result, using in a single query.
Say your table has:
| id | tran_dt | card_id | merchant_id | amount |
|----|---------------------|---------|-------------|--------|
| 13 | 2015-07-23 16:40:00 | 1 | 1 | 10 |
| 14 | 2015-07-23 16:59:00 | 1 | 1 | 10 | <-- these should
| 15 | 2015-07-23 17:00:00 | 1 | 1 | 10 | <-- be identified
| 16 | 2015-07-23 17:01:00 | 1 | 1 | 10 | <-- in the
| 17 | 2015-07-23 17:02:00 | 1 | 1 | 10 | <-- first group
| 18 | 2015-07-23 17:03:00 | 2 | 2 | 10 |
...
| 50 | 2015-07-24 17:58:00 | 1 | 1 | 10 | <-- and these
| 51 | 2015-07-24 17:59:00 | 1 | 1 | 10 | <-- in the
| 52 | 2015-07-24 18:00:00 | 1 | 1 | 10 | <-- second
The result will be:
| id | card_id | merchant_id | numTrans | amount | dateTimeFirstTrans | dateTimeLastTrans
|----|---------|-------------|----------|--------|---------------------|---------------------
| 14 | 1 | 1 | 4 | 40 | 2015-07-23 16:59:00 | 2015-07-23 17:02:00
| 50 | 1 | 1 | 3 | 30 | 2015-07-24 17:58:00 | 2015-07-24 18:00:00
The query (SQL Fiddle):
select output.* from
(
select
min(subquery.main_id) as id,
subquery.main_card_id as card_id,
subquery.main_merchant_id as merchant_id,
count(subquery.main_id) as numTrans,
sum(subquery.main_amount) as amount,
min(subquery.x_timeFrameStart) as dateTimeFirstTrans,
max(subquery.x_timeFrameStart) as dateTimeLastTrans
from
(
select
main.id as main_id,
main.card_id as main_card_id,
main.merchant_id as main_merchant_id,
main.tran_dt as main_timeFrameStart,
main.amount as main_amount,
main.tran_dt + INTERVAL 15 MINUTE as main_timeFrameEnd,
xList.tran_dt as x_timeFrameStart,
xList.tran_dt + INTERVAL 15 MINUTE as x_timeFrameEnd
from tran as main
inner join tran as xList on /* cross list */
main.card_id = xList.card_id and
main.merchant_id = xList.merchant_id
where
xList.tran_dt between main.tran_dt and main.tran_dt + INTERVAL 15 MINUTE
) as subquery
group by subquery.main_id, subquery.main_card_id, subquery.main_merchant_id, subquery.main_timeFrameStart, subquery.main_timeFrameEnd
having count(subquery.main_id) >= 3
) as output
left join (
select
xList.id as x_id
from tran as main
inner join tran as xList on /* cross list */
main.card_id = xList.card_id and
main.merchant_id = xList.merchant_id and
main.id <> xList.id /* keep only first of the list */
where
xList.tran_dt between main.tran_dt and main.tran_dt + INTERVAL 15 MINUTE
) as exclude on output.id = exclude.x_id
where exclude.x_id is null;
The query is a bit long, and it repeats one subquery just to filter duplicates, so do your testing and tuning to make sure you don't incur in performance problems.
Upvotes: 0
Reputation: 24959
SELECT t1.card_id,t1.merchant_id,count(distinct t1.id)+1 as ChargeCount
FROM tran t1
INNER JOIN tran t2
on t2.card_id=t1.card_id
and t2.merchant_id=t1.merchant_id
and t2.tran_dt <= DATE_ADD(t1.tran_dt, INTERVAL 15 MINUTE)
and t2.id>t1.id
GROUP BY t1.card_id,t1.merchant_id
HAVING ChargeCount>2;
Upvotes: 1
Reputation: 453
An option might be adding a trigger to the tran table on insert that checks the card_id inserted against the previous 15 minutes. If the count is greater than 3 then insert it into an "audit" table that you can query at your leisure.
-- create table to store audited cards
create table audit_cards(
card_id int,
tran_dt datetime
);
-- create trigger on tran table to catch the cards used 3 times in 15 min
CREATE TRIGGER audit_card AFTER INSERT ON tran
BEGIN
if (select count(new.card_id)
from tran
where tran_dt >= (new.tran_dt - INTERVAL 15 MINUTE)) >= 3
THEN
INSERT new.card_id, new.tran_dt into audit_cards;
END;
Then you can run a report on these cards...
select * from audit_cards;
http://dev.mysql.com/doc/refman/5.6/en/trigger-syntax.html
Upvotes: 1
Reputation: 1270091
You can convert the date/time to seconds and do arithmetic on the seconds to get the value within a 15 minute clock interval:
select card_id, min(date(tran_dt)) as first_charge_time, merchant_id, count(*)
from tran
group by card_id, floor(to_seconds(tran_dt) / (60 * 15)), merchant_id
having count(*) >= 3;
The above uses to_seconds()
. In earlier versions of MySQL, you can use unix_timestamp()
.
Getting any 15 minute interval is more challenging. You can express the query as:
select t1.*, count(*) as numTransactions
from tran t1 join
tran t2
on t1.merchant_id = t2.merchanti_d and
t1.card_id = t2.card_id and
t2.tran_dt >= t1.tran_dt and
t2.tran_dt < t1.tran_dt + interval 15 minute
group by t1.id
having numTransactions >= 3;
Performance of this query might be problematic. An index on trans(card_id, merchant_id, tran_dt)
should help a lot.
Upvotes: 2