Worka
Worka

Reputation: 45

How many users have 2 or more transactions in period of n-days without skipping even a day

I need help to find and count orders in certain consecutive period of time from 'sales_track' table from users that has minimum of two or more transactions (rephrase: How many users have 2 or more transactions in period of n-days without skipping even a day)

sales_track

sales_tx_id | u_id | create_date | item_id | price
------------|------|-------------|---------|---------
 ffff-0291  | 0001 | 2019-08-01  | 0300    | 5.00
 ffff-0292  | 0001 | 2019-08-01  | 0301    | 2.50
 ffff-0293  | 0002 | 2019-08-01  | 0209    | 3.50
 ffff-0294  | 0003 | 2019-08-01  | 0020    | 1.00
 ffff-0295  | 0001 | 2019-08-02  | 0301    | 2.50
 ffff-0296  | 0001 | 2019-08-02  | 0300    | 5.00
 ffff-0297  | 0001 | 2019-08-02  | 0209    | 3.50
 ffff-0298  | 0002 | 2019-08-02  | 0300    | 5.00    

For simplicity sake sample is for two consecutive days (period of time is between 2019-08-01 and 2019-08-02) only, in real operation I would have to search eg. 10 consecutive days transaction.

I'm able so far to find the minimum two or more transactions.

SELECT user_id, COUNT (user_id) FROM sales_track WHERE created_at BETWEEN 
('2019-08-01') AND ('2019-08-02') 
GROUP BY u_id HAVING COUNT (sales_tx_id) >= 2;

The output I'm looking for is like:

 u_id | tx_count | tx_amount  
------|----------|------------
 0001 | 5        | 18.50

Thank you in advance your help.

Upvotes: 0

Views: 68

Answers (1)

S-Man
S-Man

Reputation: 23736

step-by-step demo:db<>fiddle

First: My extended data set:

sales_tx_id | user_id | created_at | item_id | price
:---------- | :------ | :--------- | :------ | ----:
ffff-0291   | 0001    | 2019-08-01 | 0300    |  5.00
ffff-0292   | 0001    | 2019-08-01 | 0301    |  2.50
ffff-0293   | 0002    | 2019-08-01 | 0209    |  3.50
ffff-0294   | 0003    | 2019-08-01 | 0020    |  1.00
ffff-0295   | 0001    | 2019-08-02 | 0301    |  2.50
ffff-0296   | 0001    | 2019-08-02 | 0300    |  5.00
ffff-0297   | 0001    | 2019-08-02 | 0209    |  3.50
ffff-0298   | 0002    | 2019-08-02 | 0300    |  5.00
ffff-0299   | 0001    | 2019-08-05 | 0209    |  3.50
ffff-0300   | 0001    | 2019-08-05 | 0020    |  1.00
ffff-0301   | 0001    | 2019-08-06 | 0209    |  3.50
ffff-0302   | 0001    | 2019-08-06 | 0020    |  1.00
ffff-0303   | 0001    | 2019-08-07 | 0209    |  3.50
ffff-0304   | 0001    | 2019-08-07 | 0020    |  1.00
ffff-0305   | 0002    | 2019-08-08 | 0300    |  5.00
ffff-0306   | 0002    | 2019-08-08 | 0301    |  2.50
ffff-0307   | 0001    | 2019-08-09 | 0209    |  3.50
ffff-0308   | 0001    | 2019-08-09 | 0020    |  1.00
ffff-0309   | 0002    | 2019-08-09 | 0300    |  5.00
ffff-0310   | 0002    | 2019-08-09 | 0301    |  2.50
ffff-0311   | 0001    | 2019-08-10 | 0209    |  3.50
ffff-0312   | 0001    | 2019-08-10 | 0020    |  1.00
ffff-0313   | 0002    | 2019-08-10 | 0300    |  5.00

User 1 has 3 streaks:

  • 2019-08-01, 2019-08-02
  • 2019-08-05, 2019-08-06, 2019-08-07
  • 2019-08-09, 2019-08-10

User 2:

  • Has transaction at 2019-08-01, 2019-08-02, but only one each date, so that does not count
  • Has streak on 2019-08-08, 2019-08-09 (2019-08-10 has only one transaction, does not extend streak)

So we are expecting 4 rows: 3 for each user 1 streak, 1 for user 2

SELECT                                       -- 4
    user_id, 
    SUM(count), 
    SUM(price), 
    MIN(created_at) AS consecutive_start
FROM (
    SELECT *,                                -- 3
       SUM(is_in_same_group) OVER (PARTITION BY user_id ORDER BY created_at) AS group_id
    FROM (
        SELECT                               -- 2
            *,
           (lag(created_at, 1, created_at) OVER (PARTITION BY user_id ORDER BY created_at) + 1 <> created_at)::int as is_in_same_group
        FROM (
            SELECT                           -- 1
                created_at,
                user_id,
                COUNT(*),
                SUM(price) AS price
            FROM
                sales_track
            WHERE created_at BETWEEN '2018-02-01' AND '2019-08-11'
            GROUP BY created_at, user_id
            HAVING COUNT(*) >= 2
        ) s
    ) s
) s
GROUP BY user_id, group_id
  1. Grouping all (created_at, user_id) groups and remove those with COUNT() < 2
  2. the lag() window function allows to get the value of the previous record within one ordered group. The group here is the user_id. The check here is: If the current created_at value is the next to the previous (current + 1) then 0, else 1.
  3. Now we can use the cummulative SUM() window function to sum these values: The value increases if the gap is too big (if value is 1) otherwise it is the same value as the previous date. Now we got a group_id for all dates that only differ +1
  4. Finally these groups can be grouped for SUM() and COUNT()

Upvotes: 1

Related Questions