fatpandabot
fatpandabot

Reputation: 11

Sum rows in a table if one column's value is also in another table

I have two tables like the following. For each product and day, I'd like to sum the amount in table_1 if both the sender and receiver were top clients in table_2 for the same product and day.

Table_1:

product date sender receiver amount
X 2001-01-01 A B 5
X 2001-01-01 A C 2
X 2001-01-01 B D 1
X 2001-01-02 B C 10
X 2001-01-02 A D 3
Y 2001-01-01 A E 5

Table_2:

product date top client
X 2001-01-01 A
X 2001-01-01 B
X 2001-01-01 D
X 2001-01-02 B
X 2001-01-02 C
Y 2001-01-01 C

And the final output would have one row per product-day. For product X on date 2001-01-01, receiver C is not a top client in table_2, so the total_amount for X on 2001-01-01 is the sum of 5 and 1 only, not 2. For product X on 2001-01-02, only B and C were top clients, so the total_amount is only 10, not 10+3.

product date total_amount
X 2001-01-01 6
X 2001-01-02 10
Y 2001-01-01 0

I tried the following code:

select product, date, sum(amount) as total_amount from table_1 
  where (sender in (select top_client from table_2))
  and (receiver in (select top_client from table_2))
  group by product, date

However, this code looks at all the top_clients from table_2, not only the top_clients for a specific product-date. For example, receiver C was considered a top client for product X on date 2001-01-01, total_amount for that product-day ended up being 8 when it should be 6. The table I got was the following, and it's not what I want,

product date total_amount
X 2001-01-01 8
X 2001-01-02 13
Y 2001-01-01 0

I also tried to group the top_clients in table_2:

select product, date, sum(amount) as total_amount from table_1
where sender in unnest((select array_agg(top_client) from table_2 
group by product, date )) and receiver in unnest((select array_agg(top_client) from table_2 group by product, date ))
group by product, date

However, I got the following error: "Scalar subquery produced more than one element"

What is the right way of creating an array of top_clients for a product-date? Thank you in advance.

Upvotes: 0

Views: 105

Answers (1)

Samuel
Samuel

Reputation: 3528

You need to join table 2 to table 1 twice.

In the next step the amount is set to zero, if no sender or no receiver were joined.

This is a good solution for small datasets:

with tbl as (Select "X" product,    "01.01.2001" date,  "A" sender, "B" receiver,   5 amount
Union ALL Select    "X",    "01.01.2001",   "A",    "C",2
Union ALL Select    "X",    "01.01.2001",   "B",    "D",1
Union ALL Select    "X",    "02.01.2001",   "B",    "C",10,
Union ALL Select    "X",    "02.01.2001",   "A",    "D",3,
Union ALL Select    "Y",    "01.01.2001",   "A",    "E",5,
),
table_2 as (Select  "X" product,    "01.01.2001" date , "A" top_client,
Union ALL Select    "X",    "01.01.2001",   "B",
Union ALL Select    "X",    "01.01.2001",   "D",
Union ALL Select    "X",    "02.01.2001",   "B",
Union ALL Select    "X",    "02.01.2001",   "C",
Union ALL Select    "Y",    "01.01.2001",   "C",
)

select 
tbl.product,    tbl.date,
sum(if(A.top_client is not null and B.top_client is not null, amount,0))    total_amount
#tbl.*,
#A.top_client as sender_ok,
#B.top_client as receiver_ok,
from tbl
left join table_2 A
on tbl.product=A.product and tbl.date=A.date and tbl.sender=A.top_client

left join table_2 B
on tbl.product=B.product and tbl.date=B.date and tbl.receiver=B.top_client

#where A.top_client is not null and B.top_client is not null 
group by 1,2

For large datasets, first merge both tables by union them instead of a join condition. The flaggy column is to distinguish these two. Please mark the columns, which are only present in one table by null in the other one. Then use two windows function (over) for the search of the sender/ receiver. The max(flaggy) will be one if the sender is present in table 2.

with tbl as (Select "X" product,    "01.01.2001" date,  "A" sender, "B" receiver,   5 amount
Union ALL Select    "X",    "01.01.2001",   "A",    "C",2
Union ALL Select    "X",    "01.01.2001",   "B",    "D",1
Union ALL Select    "X",    "02.01.2001",   "B",    "C",10,
Union ALL Select    "X",    "02.01.2001",   "A",    "D",3,
Union ALL Select    "Y",    "01.01.2001",   "A",    "E",5,
),
table_2 as (Select  "X" product,    "01.01.2001" date , "A" top_client,
Union ALL Select    "X",    "01.01.2001",   "B",
Union ALL Select    "X",    "01.01.2001",   "D",
Union ALL Select    "X",    "02.01.2001",   "B",
Union ALL Select    "X",    "02.01.2001",   "C",
Union ALL Select    "Y",    "01.01.2001",   "C",
),
tbl_merge as (
Select 0 as flaggy, # to distinguish the tables
 product, date, sender, receiver, 
 amount 
from tbl
union all
select 1, # flaggy set to one 
 product, date, top_client,top_client, 
 null # beacuse amount column missing
from table_2

)

select 
product,date,
sum(if(flaggy_test=2, amount,0)) as total_amount 
from (
select *,
max(flaggy) over (partition by product,date,sender)+
max(flaggy) over (partition by product,date,receiver) as flaggy_test
from tbl_merge
)
where flaggy=0
group by 1,2
order by 1 desc

Upvotes: 1

Related Questions