spe92
spe92

Reputation: 53

Better Design for eliminate duplicates after union query

I'm designing a UNION query to merge two tables with customer informations in a oracle 11g database. The first table a is the 'major' source the second table b is a additional source with new and duplicate entries.

The duplicates in b can not be eliminated by using UNION in fact of not equal fields like the autoincremented ID which is necessary to select.

Table a

ID CUSTOMER_NUMBER NAME STREET 1 4711 Dirk Downstreet 4 2 4721 Hans Mainstreet 5

Table b

ID CUSTOMER_NUMBER NAME STREET 44 4711 Dirk Downstreet 4 <== Duplicate 4 4741 Harry Crossroad 9 <== new

Expected Result

ID CUSTOMER_NUMBER NAME STREET DATASOURCE 1 4711 Dirk Downstreet 4 SAP <== from a 2 4721 Hans Mainstreet 5 SAP <== from a 4 4741 Harry Crossroad 9 MANUAL <== from b

I'm pretty happy with the following - simplified - trial:

SELECT CUSTOMER_NUMBER, 
    MAX(ID) KEEP (DENSE_RANK FIRST ORDER BY DATASOURCE DESC) ID,
    MAX(NAME) KEEP (DENSE_RANK FIRST ORDER BY DATASOURCE DESC) NAME,
    MAX(STREET) KEEP (DENSE_RANK FIRST ORDER BY DATASOURCE DESC) STREET,
FROM 
    (SELECT "ID","CUSTOMER_NUMBER","NAME","STREET", 'SAP' as DATASOURCE FROM CUSTOMERS
        UNION ALL
    SELECT "ID","CUSTOMER_NUMBER","NAME","STREET", 'MANUAL' as DATASOURCE FROM CUSTOMERS_MANUAL) united
group by CUSTOMER_NUMBER

But I have to SELECT every single field by DENSE_RANK FIRST ORDER BY DATASOURCE DESC which are about 20 fields ...

Can anyone show me a better approch?

Upvotes: 1

Views: 54

Answers (1)

Marmite Bomber
Marmite Bomber

Reputation: 21043

The alternative to KEEP for each column is to use ROW_NUMBER with the partitioning on your unique key and propper order and select only the row with the number one.

Example for CUSTOMER_NUMBERas unique key, prefering MANUALover SAP and expecting the ID is unique in each source.:

SELECT * FROM 
(
SELECT 
   "ID","CUSTOMER_NUMBER","NAME","STREET",
   roww_number() over (partition by CUSTOMER_NUMBER order by decode(DATASOURCE,'SAP',2,'MANUAL',1), ID) as RN
FROM 
    (SELECT   "ID","CUSTOMER_NUMBER","NAME","STREET", 'SAP' as DATASOURCE FROM CUSTOMERS
        UNION ALL
     SELECT   "ID","CUSTOMER_NUMBER","NAME","STREET", 'MANUAL' as DATASOURCE FROM CUSTOMERS_MANUAL) united
) WHERE RN = 1

This works fine even in case that the individual sources deliver duplicates. Adjust the order columns so, that the query remains deterministic, i.e. repeated query provides the same result (e.g. add NAME if the ID column can be duplicated in SAP)

Upvotes: 2

Related Questions