Reputation: 11
I have two tables like the following. For each product and day, I'd like to sum the amount in table_1 if both the sender and receiver were top clients in table_2 for the same product and day.
Table_1:
product | date | sender | receiver | amount |
---|---|---|---|---|
X | 2001-01-01 | A | B | 5 |
X | 2001-01-01 | A | C | 2 |
X | 2001-01-01 | B | D | 1 |
X | 2001-01-02 | B | C | 10 |
X | 2001-01-02 | A | D | 3 |
Y | 2001-01-01 | A | E | 5 |
Table_2:
product | date | top client |
---|---|---|
X | 2001-01-01 | A |
X | 2001-01-01 | B |
X | 2001-01-01 | D |
X | 2001-01-02 | B |
X | 2001-01-02 | C |
Y | 2001-01-01 | C |
And the final output would have one row per product-day. For product X on date 2001-01-01, receiver C is not a top client in table_2, so the total_amount for X on 2001-01-01 is the sum of 5 and 1 only, not 2. For product X on 2001-01-02, only B and C were top clients, so the total_amount is only 10, not 10+3.
product | date | total_amount |
---|---|---|
X | 2001-01-01 | 6 |
X | 2001-01-02 | 10 |
Y | 2001-01-01 | 0 |
I tried the following code:
select product, date, sum(amount) as total_amount from table_1
where (sender in (select top_client from table_2))
and (receiver in (select top_client from table_2))
group by product, date
However, this code looks at all the top_clients from table_2, not only the top_clients for a specific product-date. For example, receiver C was considered a top client for product X on date 2001-01-01, total_amount for that product-day ended up being 8 when it should be 6. The table I got was the following, and it's not what I want,
product | date | total_amount |
---|---|---|
X | 2001-01-01 | 8 |
X | 2001-01-02 | 13 |
Y | 2001-01-01 | 0 |
I also tried to group the top_clients in table_2:
select product, date, sum(amount) as total_amount from table_1
where sender in unnest((select array_agg(top_client) from table_2
group by product, date )) and receiver in unnest((select array_agg(top_client) from table_2 group by product, date ))
group by product, date
However, I got the following error: "Scalar subquery produced more than one element"
What is the right way of creating an array of top_clients for a product-date? Thank you in advance.
Upvotes: 0
Views: 105
Reputation: 3528
You need to join table 2 to table 1 twice.
In the next step the amount
is set to zero, if no sender or no receiver were joined.
This is a good solution for small datasets:
with tbl as (Select "X" product, "01.01.2001" date, "A" sender, "B" receiver, 5 amount
Union ALL Select "X", "01.01.2001", "A", "C",2
Union ALL Select "X", "01.01.2001", "B", "D",1
Union ALL Select "X", "02.01.2001", "B", "C",10,
Union ALL Select "X", "02.01.2001", "A", "D",3,
Union ALL Select "Y", "01.01.2001", "A", "E",5,
),
table_2 as (Select "X" product, "01.01.2001" date , "A" top_client,
Union ALL Select "X", "01.01.2001", "B",
Union ALL Select "X", "01.01.2001", "D",
Union ALL Select "X", "02.01.2001", "B",
Union ALL Select "X", "02.01.2001", "C",
Union ALL Select "Y", "01.01.2001", "C",
)
select
tbl.product, tbl.date,
sum(if(A.top_client is not null and B.top_client is not null, amount,0)) total_amount
#tbl.*,
#A.top_client as sender_ok,
#B.top_client as receiver_ok,
from tbl
left join table_2 A
on tbl.product=A.product and tbl.date=A.date and tbl.sender=A.top_client
left join table_2 B
on tbl.product=B.product and tbl.date=B.date and tbl.receiver=B.top_client
#where A.top_client is not null and B.top_client is not null
group by 1,2
For large datasets, first merge both tables by union
them instead of a join condition. The flaggy
column is to distinguish these two. Please mark the columns, which are only present in one table by null
in the other one. Then use two windows function (over
) for the search of the sender/ receiver. The max(flaggy)
will be one if the sender is present in table 2.
with tbl as (Select "X" product, "01.01.2001" date, "A" sender, "B" receiver, 5 amount
Union ALL Select "X", "01.01.2001", "A", "C",2
Union ALL Select "X", "01.01.2001", "B", "D",1
Union ALL Select "X", "02.01.2001", "B", "C",10,
Union ALL Select "X", "02.01.2001", "A", "D",3,
Union ALL Select "Y", "01.01.2001", "A", "E",5,
),
table_2 as (Select "X" product, "01.01.2001" date , "A" top_client,
Union ALL Select "X", "01.01.2001", "B",
Union ALL Select "X", "01.01.2001", "D",
Union ALL Select "X", "02.01.2001", "B",
Union ALL Select "X", "02.01.2001", "C",
Union ALL Select "Y", "01.01.2001", "C",
),
tbl_merge as (
Select 0 as flaggy, # to distinguish the tables
product, date, sender, receiver,
amount
from tbl
union all
select 1, # flaggy set to one
product, date, top_client,top_client,
null # beacuse amount column missing
from table_2
)
select
product,date,
sum(if(flaggy_test=2, amount,0)) as total_amount
from (
select *,
max(flaggy) over (partition by product,date,sender)+
max(flaggy) over (partition by product,date,receiver) as flaggy_test
from tbl_merge
)
where flaggy=0
group by 1,2
order by 1 desc
Upvotes: 1