Reputation: 25

SQL Union not including duplicates based on single column?

I'm trying to union two tables but I need to essentially 'prefer' the first table using just one 'id' column. If an 'id' appears in the second table that already exists in the first, I do not want to include that record.

Query looks like this

            select id, col2, col3
            from table(p_package.getData(param))

            union

            select id, col2, col3 
            from table1         
            where col7 = 'pass'
            and col8 <> 'A' 
            and col9 = to_date(Date, 'mm/dd/yyyy')

the p_package.getData(param) is a pipelined function which returns a table. I would like to avoid calling this twice for performance reasons

Upvotes: 0

Answers (3)

Boap

Reputation: 25

The other solutions work but I opted to use a common table expression as suggested by xQbert

        with cte as
        (select id, col2, col3
        from table(p_package.getData(param)))

        select * from cte

        union

        select id, col2, col3 
        from table1         
        where col7 = 'pass'
        and col8 <> 'A' 
        and col9 = to_date(Date, 'mm/dd/yyyy')
        and id not in (select id from cte)

EDIT: I realized that a CTE does not actually store the data returned by a query but stores the query itself instead. While this works it does not avoid calling the pipelined function twice

Upvotes: 0

MT0

Reputation: 167774

You can use the ROW_NUMBER() analytic function to remove the duplicates:

SELECT id, col2, col3
FROM   (
  SELECT id, col2, col3,
         ROW_NUMBER() OVER ( PARTITION BY id ORDER BY priority ) AS rn
  FROM   (
    select id, col2, col3, 1 AS priority
    from   table(p_package.getData(param))
  UNION ALL
    select id, col2, col3, 2
    from table1         
    where col7 = 'pass'
    and   col8 <> 'A' 
    and   col9 = to_date(Date, 'mm/dd/yyyy')
  )
)
WHERE rn = 1

and as a bonus, since you're filtering the duplicates elsewhere, you could change UNION to UNION ALL.

If you can have duplicates id values from the pipelined function and you want those but not any from table1 then:

SELECT id, col2, col3
FROM   (
  SELECT id, col2, col3, priority
         ROW_NUMBER() OVER ( PARTITION BY id ORDER BY priority ) AS rn
  FROM   (
    select id, col2, col3, 1 AS priority
    from   table(p_package.getData(param))
  UNION ALL
    select id, col2, col3, 2
    from table1         
    where col7 = 'pass'
    and   col8 <> 'A' 
    and   col9 = to_date(Date, 'mm/dd/yyyy')
  )
)
WHERE priority = 1
OR    rn = 1

Upvotes: 1

Tim Biegeleisen

Reputation: 520898

Assuming you don't want to include any col1 value in the second half of the union which would introduce a value already included in the first half, you could use an exists clause:

select col1, col2, col3
from table(p_package.getData(param))
union
select col1, col2, col3 
from table1 t1
where col7 = 'pass' and col8 <> 'A'and col9 = to_date(Date, 'mm/dd/yyyy') and
      not exists (select 1 from table(p_package.getData(param)) t2
                  where t1.col1 = t2.col1);

Upvotes: 0

SQL Union not including duplicates based on single column?

Answers (3)

Related Questions