Reputation: 1401
I have the following code that I am using to compare two columns from different tables, the code almost runs perfectly if I don't have any where
clauses/filters in my code below.
If I add the where clause I do get extra rows that I don't want to see.
with source1 as (
select
b.id,
b.qty,
a.price
from <table> as a
,unnest <details> as b
where b.status != 'canceled'
),
source2 as (
select id_, qty_, price_ from <table2>
where city != 'delhi'
)
select *
from source1 s1
full outer join source2 s2
on id = id_
where format('%t', s1) != format('%t', s2)
Below is the sample data:
s1:
id qty price status
1 100 (null) canceled
2 0 100 done
3 0 80 canceled
4 50 90 done
5 20 100 done
6 20 100 done
7 80 80 done
8 100 100 canceled
9 40 0 done
10 11 22 done
11 40 40 done
12 null 90 done
s2:
id_ qty_ price_ city_
1 100 200 ny
2 0 100 ny
3 0 80 ny
4 50 80 ny
5 40 100 ny
6 40 40 ny
7 200 200 delhi
8 100 100 delhi
9 40 100 ny
10 11 22 delhi
12 11 11 ny
13 90 80 NY
Expected Results:
id qty price status id_ qty_ price_ city_
4 50 90 done 4 50 80 ny
5 20 100 done 5 40 100 ny
6 20 100 done 6 40 40 ny
9 40 0 done 9 40 100 ny
11 40 40 done null null null null
12 null 90 done 12 11 11 ny
null null null null 13 90 80 ny
In general, I want all the rows that have at least one mismatch in any of the columns (qty, price, status) but only when the status is not canceled or city is not equal to delhi and show the values of both tables (s1,s2) for each column in a single row (as shown in expected results)
If a row exists in one table and not the other and doesn't have status = canceled
or city = delhi
, then it should show up
if the city != delhi
and status != canceled
and the values are same for (qty,price,status) then, I don't want that row ***
Current issues:
where status != 'canceled'
-- *this will exclude all the rows with cases = canceled from source 1 but my source2 will still show the rows where the status is actually canceled, and will bring that row up which I don't want
similarly, source2
has a condition: where city != 'delhi'
and source1 not, this will again show rows which i don't want
If I pass the column status
and city
in the select
statements of my codes above (code 1 and code 2) it will be passed in the condition: where format('%t', s1) != format('%t', s2)
and so each time there will be a mismatch since city column doesn't exist in source1 and status one in the source2. The string/serial generated will not be able to match for example:
s1:
id, qty, price, status
1 10 100 cancelled
s2:
id_ qty_ price_ city
1 10 100 Delhi
where format('%t', s1) != format('%t', s2)
would generate:
(1,10,100,cancelled) != (1,10,100,delhi)
in this case, the columns have the same values (qty, price, status) but the row will still show up which I don't want due to the issue explained above.
Questions:
Is there a way I can pass specific columns to the part format('%t',s2)
, instead of passing the entire table name, that should solve the problem ? If I can somehow not let the status and city be part of serialization
How can I tackle the where clauses / filters in these cases, right now I only have one filter for each table but it can be more in the future.
How can I get the expected output ? I would prefer not to use any other method apart from this serialization i.e format('%t', s2) (if possible) as I already have most of the code and would like to adjust it to cover all the cases
Upvotes: 0
Views: 167
Reputation: 172994
Is there a way I can pass specific columns to the part format('%t',s2), instead of passing the entire table name, that should solve the problem ? If I can somehow not let the status and city be part of serialization
just use below
where format('%t', (id, qty, price)) != format('%t', (id_, qty_, price_))
Upvotes: 1