Reputation: 45
I need help to find and count orders in certain consecutive period of time from 'sales_track' table from users that has minimum of two or more transactions (rephrase: How many users have 2 or more transactions in period of n-days without skipping even a day)
sales_track
sales_tx_id | u_id | create_date | item_id | price
------------|------|-------------|---------|---------
ffff-0291 | 0001 | 2019-08-01 | 0300 | 5.00
ffff-0292 | 0001 | 2019-08-01 | 0301 | 2.50
ffff-0293 | 0002 | 2019-08-01 | 0209 | 3.50
ffff-0294 | 0003 | 2019-08-01 | 0020 | 1.00
ffff-0295 | 0001 | 2019-08-02 | 0301 | 2.50
ffff-0296 | 0001 | 2019-08-02 | 0300 | 5.00
ffff-0297 | 0001 | 2019-08-02 | 0209 | 3.50
ffff-0298 | 0002 | 2019-08-02 | 0300 | 5.00
For simplicity sake sample is for two consecutive days (period of time is between 2019-08-01 and 2019-08-02) only, in real operation I would have to search eg. 10 consecutive days transaction.
I'm able so far to find the minimum two or more transactions.
SELECT user_id, COUNT (user_id) FROM sales_track WHERE created_at BETWEEN
('2019-08-01') AND ('2019-08-02')
GROUP BY u_id HAVING COUNT (sales_tx_id) >= 2;
The output I'm looking for is like:
u_id | tx_count | tx_amount
------|----------|------------
0001 | 5 | 18.50
Thank you in advance your help.
Upvotes: 0
Views: 68
Reputation: 23736
First: My extended data set:
sales_tx_id | user_id | created_at | item_id | price
:---------- | :------ | :--------- | :------ | ----:
ffff-0291 | 0001 | 2019-08-01 | 0300 | 5.00
ffff-0292 | 0001 | 2019-08-01 | 0301 | 2.50
ffff-0293 | 0002 | 2019-08-01 | 0209 | 3.50
ffff-0294 | 0003 | 2019-08-01 | 0020 | 1.00
ffff-0295 | 0001 | 2019-08-02 | 0301 | 2.50
ffff-0296 | 0001 | 2019-08-02 | 0300 | 5.00
ffff-0297 | 0001 | 2019-08-02 | 0209 | 3.50
ffff-0298 | 0002 | 2019-08-02 | 0300 | 5.00
ffff-0299 | 0001 | 2019-08-05 | 0209 | 3.50
ffff-0300 | 0001 | 2019-08-05 | 0020 | 1.00
ffff-0301 | 0001 | 2019-08-06 | 0209 | 3.50
ffff-0302 | 0001 | 2019-08-06 | 0020 | 1.00
ffff-0303 | 0001 | 2019-08-07 | 0209 | 3.50
ffff-0304 | 0001 | 2019-08-07 | 0020 | 1.00
ffff-0305 | 0002 | 2019-08-08 | 0300 | 5.00
ffff-0306 | 0002 | 2019-08-08 | 0301 | 2.50
ffff-0307 | 0001 | 2019-08-09 | 0209 | 3.50
ffff-0308 | 0001 | 2019-08-09 | 0020 | 1.00
ffff-0309 | 0002 | 2019-08-09 | 0300 | 5.00
ffff-0310 | 0002 | 2019-08-09 | 0301 | 2.50
ffff-0311 | 0001 | 2019-08-10 | 0209 | 3.50
ffff-0312 | 0001 | 2019-08-10 | 0020 | 1.00
ffff-0313 | 0002 | 2019-08-10 | 0300 | 5.00
User 1 has 3 streaks:
User 2:
So we are expecting 4 rows: 3 for each user 1 streak, 1 for user 2
SELECT -- 4
user_id,
SUM(count),
SUM(price),
MIN(created_at) AS consecutive_start
FROM (
SELECT *, -- 3
SUM(is_in_same_group) OVER (PARTITION BY user_id ORDER BY created_at) AS group_id
FROM (
SELECT -- 2
*,
(lag(created_at, 1, created_at) OVER (PARTITION BY user_id ORDER BY created_at) + 1 <> created_at)::int as is_in_same_group
FROM (
SELECT -- 1
created_at,
user_id,
COUNT(*),
SUM(price) AS price
FROM
sales_track
WHERE created_at BETWEEN '2018-02-01' AND '2019-08-11'
GROUP BY created_at, user_id
HAVING COUNT(*) >= 2
) s
) s
) s
GROUP BY user_id, group_id
created_at
, user_id
) groups and remove those with COUNT() < 2
lag()
window function allows to get the value of the previous record within one ordered group. The group here is the user_id
. The check here is: If the current created_at value is the next to the previous (current + 1) then 0
, else 1
.SUM()
window function to sum these values: The value increases if the gap is too big (if value is 1
) otherwise it is the same value as the previous date. Now we got a group_id
for all dates that only differ +1SUM()
and COUNT()
Upvotes: 1