Reputation: 13
I have a table with orders (for each order (transaction_id) there are client_id and item_id and quantity) I need to find completely identical orders for a set of goods and their quantity in the order
CREATE TABLE IF NOT EXISTS orders (client_id varchar(10),
item_id varchar(10), quantity int, transaction_id varchar(10));
INSERT INTO orders values
('CL1111','111',1, '1001'),
('CL1111','222',2,'1001'),
('CL1111','333',1,'1001'),
('CL2222','111',2,'1002'),
('CL2222','222',1,'1002'),
('CL2222','333',1,'1002'),
('CL3333','111',1,'1003'),
('CL3333','222',2,'1003'),
('CL3333','333',1,'1003'),
('CL3333','444',1,'1003'),
('CL4444','111',1,'1004'),
('CL4444','222',2,'1004'),
('CL4444','333',1,'1004'),
('CL5555','111',1,'1005'),
('CL5555','222',2,'1005'),
('CL6666','111',1,'1006'),
('CL6666','222',2,'1006'),
('CL6666','333',1,'1007')
client_id | item_id | quantity | transaction_id |
---|---|---|---|
CL1111 | 111 | 1 | 1001 |
CL1111 | 222 | 2 | 1001 |
CL1111 | 333 | 1 | 1001 |
CL2222 | 111 | 2 | 1002 |
CL2222 | 222 | 1 | 1002 |
CL2222 | 333 | 1 | 1002 |
CL3333 | 111 | 1 | 1003 |
CL3333 | 222 | 2 | 1003 |
CL3333 | 333 | 1 | 1003 |
CL3333 | 444 | 1 | 1003 |
CL4444 | 111 | 1 | 1004 |
CL4444 | 222 | 2 | 1004 |
CL4444 | 333 | 1 | 1004 |
CL5555 | 111 | 1 | 1005 |
CL5555 | 222 | 2 | 1005 |
CL6666 | 111 | 1 | 1006 |
CL6666 | 222 | 2 | 1006 |
CL6666 | 333 | 1 | 1007 |
Identical orders here are: (1001 and 1004), (1005 and 1006) And now I don't know, how to leave orders, which completely identical on item_id, quantity and set of items.
Upvotes: 1
Views: 64
Reputation: 13
select
transaction_id
from
(
select
transaction_id,
ai,
aq,
count(1) over (partition by ai, aq) cnt
from
(
select
transaction_id,
array_agg(array[item_id]) ai,
array_agg(array[quantity]) aq
from orders
group by transaction_id
) a
) a
where cnt > 1
Upvotes: 0
Reputation: 5633
Try this query. Order clause in aggregation is necessary for reliable result.
with transactions as(
select client_id,transaction_id
,string_agg( (item_id ||'('||cast(quantity as varchar) || ')') ,',' order by item_id) itemlist
from orders
group by client_id,transaction_id
)
select *
,row_number()over(partition by itemlist order by client_id,transaction_id) rn
,count(*)over(partition by itemlist) cnt
from transactions
order by itemlist
Rows with cnt>1 is has identical row (identical items with same quantity).
With test data, find 2 identical groups for transaction_id 1001-1004 and 1005-1006 (not green in your picture)
result is
client_id | transaction_id | itemlist | rn | cnt |
---|---|---|---|---|
CL5555 | 1005 | 111(1),222(2) | 1 | 2 |
CL6666 | 1006 | 111(1),222(2) | 2 | 2 |
CL1111 | 1001 | 111(1),222(2),333(1) | 1 | 2 |
CL4444 | 1004 | 111(1),222(2),333(1) | 2 | 2 |
CL3333 | 1003 | 111(1),222(2),333(1),444(1) | 1 | 1 |
CL2222 | 1002 | 111(2),222(1),333(1) | 1 | 1 |
CL6666 | 1007 | 333(1) | 1 | 1 |
Test data
INSERT INTO orders (client_id,item_id,transaction_id,quantity)
values
('CL1111','111','1001',1),
('CL1111','222','1001',2),
('CL1111','333','1001',1),
('CL2222','111','1002',2),
('CL2222','222','1002',1),
('CL2222','333','1002',1),
('CL3333','111','1003',1),
('CL3333','222','1003',2),
('CL3333','333','1003',1),
('CL3333','444','1003',1),
('CL4444','111','1004',1),
('CL4444','222','1004',2),
('CL4444','333','1004',1),
('CL5555','111','1005',1),
('CL5555','222','1005',2),
('CL6666','111','1006',1),
('CL6666','222','1006',2),
('CL6666','333','1007',1)
;
Upvotes: 0
Reputation: 14848
Please check this query, it returns correct rows in dbfiddle
with agg as (
SELECT client_id,
array_agg(array[item_id]) ai, array_agg(array[quantity]) aq
from orders group by client_id
having count(distinct transaction_id) = 1 )
select client_id, ai items, aq quantities
from (
select client_id, ai , aq , count(1) over (partition by ai, aq) cnt
from agg ) c
where cnt > 1
client_id | items | quantities |
---|---|---|
CL1111 | {{111},{222},{333}} | {{1},{2},{1}} |
CL4444 | {{111},{222},{333}} | {{1},{2},{1}} |
I am not sure what about clients having several transactions, but in your examples clients CL5555 and CL6666 are not matched even though they have same transactions 1005 and 1006. So here only clients having one distinct transaction are compared.
Upvotes: 0
Reputation: 311
Your example code doesn't include quantity! Here is a code that results as many rows as the table contains the same orders:
sql = """
with orders_mod as
(
select distinct
"transaction_id",
string_agg("item_id" || ' ' || "quantity"::text, ',') over (partition by "client_id") as "list_of_ordered_item_ids"
from orders
)
select distinct
string_agg("transaction_id", ',') over (partition by "list_of_ordered_item_ids") as "list_of_the_same_orders",
"list_of_ordered_item_ids"
from orders_mod
;
"""
df_SQL = pd.read_sql(sql, con)
df_SQL
Upvotes: 0