Reputation: 113
I need to compare two tables data and check which attributed are mismatching, tables have same table definition, but the problem is i dint have a unique key to compare. I tried to use
CONCAT(CONCAT(CONCAT(table1.A, Table1.B))
=CONCAT(CONCAT(CONCAT(table2.A, Table2.B))
but still facing duplicate rows also tried NVL on few columns but didn't work
SELECT
UT.cat,
PD.cat
FROM
EM UT, EM_63 PD
WHERE
NVL(UT.cat, 1) = NVL(PD.cat, 1) AND
NVL(UT.AT_NUMBER, 1) = NVL(PD.AT_NUMBER, 1) AND
NVL(UT.OFFSET, 1) = NVL(PD.OFFSET, 1) AND
NVL(UT.PROD, 1) = NVL(PD.PROD, 1)
;
There are 34k records in one table 35k records in another table, but if I run the above query, the count of rows is 3 millions.
Columns in table:
COUNTRY
CATEGORY
TYPE
DESCRIPTION
Sample data :
Table 1 :
COUNTRY CATEGORY TYPE DESCRIPTION
US C T1 In
IN A T2 OUT
B C T2 IN
Y C T1 INOUT
Table 2:
COUNTRY CATEGORY TYPE DESCRIPTION
US C T2 In
IN B T2 Out
Q C T2 IN
Expected output:
column Matched unmatched
COUNTRY 2 1
CATEGORY 2 1
TYPE 2 1
DESCRIPTION 3 0
Upvotes: 1
Views: 454
Reputation: 44971
Start with this query to check if these 4 columns form a key.
select occ_total,occ_ut,occ_pd
,count(*) as records
from (select count (*) as occ_total
,count (case tab when 'UT' then 1 end) as occ_ut
,count (case tab when 'PD' then 1 end) as occ_pd
from select 'UT' as tab,cat,AT_NUMBER,OFFSET,PROD from EM
union all select 'PD' ,cat,AT_NUMBER,OFFSET,PROD from EM_63 PD
) t
group by cat,AT_NUMBER,OFFSET,PROD
) t
group by occ_total,occ_ut,occ_pd
order by records desc
;
After you have chosen your "key",you can use the following query to see the attributes' values
select count (*) as occ_total
,count (case tab when 'UT' then 1 end) as occ_ut
,count (case tab when 'PD' then 1 end) as occ_pd
,count (distinct att1) as cnt_dst_att1
,count (distinct att2) as cnt_dst_att2
,count (distinct att3) as cnt_dst_att3
,...
,listagg (case tab when 'UT' then att1 end) within group (order by att1) as att1_vals_ut
,listagg (case tab when 'PD' then att1 end) within group (order by att1) as att1_vals_pd
,listagg (case tab when 'UT' then att2 end) within group (order by att2) as att2_vals_ut
,listagg (case tab when 'PD' then att2 end) within group (order by att2) as att2_vals_pd
,listagg (case tab when 'UT' then att3 end) within group (order by att3) as att3_vals_ut
,listagg (case tab when 'PD' then att3 end) within group (order by att3) as att3_vals_pd
,...
from select 'UT' as tab,cat,AT_NUMBER,OFFSET,PROD,att1,att2,att3,... from E M
union all select 'PD' ,cat,AT_NUMBER,OFFSET,PROD,att1,att2,att3,... from EM_63 PD
) t
group by cat,AT_NUMBER,OFFSET,PROD
;
Upvotes: 1
Reputation:
In the most general case (when you may have duplicate rows, and you want to see which rows exist in one table but not in the other, and ALSO which rows may exist in both tables, but the row exists 3 times in the first table but 5 times in the other):
This is a very common problem with a settled "best solution" which for some reason it seems most people are still not aware of, even though it was developed on AskTom many years ago and has been presented numerous times.
You do NOT need a join, you do not need a unique key of any kind, and you don't need to read either table more than once. The idea is to add two columns to show from which table each row comes, do a UNION ALL, then GROUP BY all the columns except the "source" columns and show the count for each table. Something like this:
select count(t_1) as count_table_1, count(t_2) as count_table_2, col1, col2, ...
from (
select 'x' as t_1, null as t_2, col1, col2, ...
from table_1
union all
select null as t_1, 'x' as t_2, col1, col2, ...
from table_2
)
group by col1, col2, ...
having count(t_1) != count(t_2)
;
Upvotes: 2
Reputation: 14007
The problem with CONCAT
is, that you could get invalid matches, if your data looks similar to this:
table1.A = '123'
table1.B = '456'
concatenates to: '123456'
table2.A = '12'
table2.B = '3456'
concatenates also to: '123456'
You have to compare the fields individually: table1.A = table2.A AND table1.B = table2.B
Upvotes: 0