Reputation: 1401
I have a perfect code that compares the data from one table with another (see below) which works totally fine and runs fine as well in BigQuery:
with source1 as (
select
b.id,
b.qty,
a.price
from <table> as a
,unnest <details> as b
where b.status != 'canceled'
),
source2 as (
select id_, qty_, price_ from <table2>
where city != 'delhi'
)
select *
from source1 s1
full outer join source2 s2
on id = id_
where format('%t', s1) != format('%t', s2)
However, the code above runs into an error in sqlfluff i.e a certain SQL formatting rules checker that I can't bypass or turn off, see the error from sqlfluff below:
ERROR FROM SQLFLUFF:
*'s1' found in select with more than one referenced table/view' and 's2' found in select with more than one referenced table/view
Does anybody know how I can fix it ?
Upvotes: 0
Views: 871
Reputation: 1401
I fixed the code by adding extra CTE and adjusting the where caluse this would not flag the sqlfluff rules:
To fix that, I tried the second code (see below):
the code seems to work but it would be great if someone could suggest how I can reduce the CTEs, considering the sqlfluff will not fail:
with s1 as (
select
b.id,
b.qty,
a.price
from <table> as a
,unnest <details> as b
where b.status != 'canceled'
),
s2 as (
select id_, qty_, price_ from <table2>
where city != 'delhi'
),
,concat_s1 as (
select
*
, format('%t',s1) as l1
from s1
)
,concat_s2 as (
select
*
, format('%t',s2) as l2
from s2
)
, combined as (
select
source1.*
,source2.*
from concat_s1 as source1
full outer join concat_s2 as source2
on source1.id_ = source2.id_
where source1.l1 != source2.l2
or source1.id is null or source2.id_ is null
Upvotes: 0
Reputation: 3538
Windows function may perform better.
Instead of joining, the tables are unioned. Then a window function will search for combinations. As you tagged this question [Big-Query] it is tested for BigQuery:
with s1 as (
select id, qty, city from <table> where x != 'pending'
),
s2 as (
select id_, qty_, city_ from <table2>
),
concat_ as (
select 1 as dummy, * , format('%t',s1) as dummy_all
from s1
union all select 2 as dummy, * , format('%t',s2) as dummy_all
from s2
)
,combine as (
select *,
sum(if(dummy=1,1,0)) over win1 = sum(if(dummy=2,1,0)) over win1 as dummy_flag
from concat_
window win1 as (partition by id,dummy_all)
)
Select * from combine
where dummy_flag is false
Upvotes: 0