KVL
KVL

Reputation: 113

Comparing two tables that doesn't have unique key

I need to compare two tables data and check which attributed are mismatching, tables have same table definition, but the problem is i dint have a unique key to compare. I tried to use

CONCAT(CONCAT(CONCAT(table1.A, Table1.B))
=CONCAT(CONCAT(CONCAT(table2.A, Table2.B))

but still facing duplicate rows also tried NVL on few columns but didn't work

SELECT  
    UT.cat,
    PD.cat
FROM 
    EM UT, EM_63 PD 
WHERE 
    NVL(UT.cat, 1) = NVL(PD.cat, 1) AND
    NVL(UT.AT_NUMBER, 1) = NVL(PD.AT_NUMBER, 1) AND
    NVL(UT.OFFSET, 1) = NVL(PD.OFFSET, 1) AND  
    NVL(UT.PROD, 1) = NVL(PD.PROD, 1)
;

There are 34k records in one table 35k records in another table, but if I run the above query, the count of rows is 3 millions.

Columns in table:

COUNTRY       
CATEGORY   
TYPE    
DESCRIPTION

Sample data :

Table 1 :

COUNTRY  CATEGORY TYPE   DESCRIPTION       
US          C       T1      In
IN          A       T2      OUT
B           C       T2      IN
Y           C       T1      INOUT

Table 2:

COUNTRY  CATEGORY TYPE   DESCRIPTION    
US          C       T2      In
IN          B        T2     Out
Q           C       T2      IN

Expected output:

column      Matched  unmatched
COUNTRY         2       1
CATEGORY        2       1
TYPE            2       1
DESCRIPTION     3       0

Upvotes: 1

Views: 454

Answers (3)

David דודו Markovitz
David דודו Markovitz

Reputation: 44971

Start with this query to check if these 4 columns form a key.

select      occ_total,occ_ut,occ_pd
           ,count(*)                as records

from       (select      count (*)                               as occ_total
                       ,count (case tab when 'UT' then 1 end)   as occ_ut
                       ,count (case tab when 'PD' then 1 end)   as occ_pd

            from                    select 'UT' as tab,cat,AT_NUMBER,OFFSET,PROD from EM
                        union all   select 'PD'       ,cat,AT_NUMBER,OFFSET,PROD from EM_63 PD
                        ) t

            group by    cat,AT_NUMBER,OFFSET,PROD
            ) t

group by    occ_total,occ_ut,occ_pd     

order by    records desc
;

After you have chosen your "key",you can use the following query to see the attributes' values

select      count (*)                               as occ_total
           ,count (case tab when 'UT' then 1 end)   as occ_ut
           ,count (case tab when 'PD' then 1 end)   as occ_pd

           ,count (distinct att1)                   as cnt_dst_att1
           ,count (distinct att2)                   as cnt_dst_att2
           ,count (distinct att3)                   as cnt_dst_att3
           ,...
           ,listagg (case tab when 'UT' then att1 end) within group (order by att1) as att1_vals_ut
           ,listagg (case tab when 'PD' then att1 end) within group (order by att1) as att1_vals_pd
           ,listagg (case tab when 'UT' then att2 end) within group (order by att2) as att2_vals_ut
           ,listagg (case tab when 'PD' then att2 end) within group (order by att2) as att2_vals_pd
           ,listagg (case tab when 'UT' then att3 end) within group (order by att3) as att3_vals_ut
           ,listagg (case tab when 'PD' then att3 end) within group (order by att3) as att3_vals_pd  
           ,...

from                    select 'UT' as tab,cat,AT_NUMBER,OFFSET,PROD,att1,att2,att3,... from E M
            union all   select 'PD'       ,cat,AT_NUMBER,OFFSET,PROD,att1,att2,att3,... from EM_63 PD
            ) t

group by    cat,AT_NUMBER,OFFSET,PROD
;

Upvotes: 1

user5683823
user5683823

Reputation:

In the most general case (when you may have duplicate rows, and you want to see which rows exist in one table but not in the other, and ALSO which rows may exist in both tables, but the row exists 3 times in the first table but 5 times in the other):

This is a very common problem with a settled "best solution" which for some reason it seems most people are still not aware of, even though it was developed on AskTom many years ago and has been presented numerous times.

You do NOT need a join, you do not need a unique key of any kind, and you don't need to read either table more than once. The idea is to add two columns to show from which table each row comes, do a UNION ALL, then GROUP BY all the columns except the "source" columns and show the count for each table. Something like this:

select   count(t_1) as count_table_1, count(t_2) as count_table_2, col1, col2, ...
from     (
           select 'x' as t_1, null as t_2, col1, col2, ... 
             from table_1
           union all
           select null as t_1, 'x' as t_2, col1, col2, ...
             from table_2
         )
group by col1, col2, ...
having   count(t_1) != count(t_2)
;

Upvotes: 2

Sefe
Sefe

Reputation: 14007

The problem with CONCATis, that you could get invalid matches, if your data looks similar to this:

table1.A = '123'
table1.B = '456'

concatenates to: '123456'

table2.A = '12'
table2.B = '3456'

concatenates also to: '123456'

You have to compare the fields individually: table1.A = table2.A AND table1.B = table2.B

Upvotes: 0

Related Questions