Reputation: 1017

SQL Union prefer records from one table?

I will start out by saying, I am working with two tables, that are a bit of a mess in terms of how they are organized. I am currently pleading my case to get this adjusted, but have no control over them. So the organization of the tables is out of my control.

I have one table that looks like this

Customer

| id | name | date_joined |
|----|------|-------------|
|  1 |  Bob |  2012-01-01 |
|  2 | Jack |  2012-01-01 |
|  3 | Jill |  2012-01-01 |

Since the customer table was created, another table has been created to hold customer information

Customer2

| id |    name | year_joined | month_joined | day_joined |
|----|---------|-------------|--------------|------------|
|  4 |     Ken |        2013 |            1 |          1 |
|  5 | Lindsey |        2013 |            1 |          1 |
|  1 |     Bob |        2012 |            1 |          1 |

You will notice that while Customer2 contains new customers for the first two entries it also contains Bob, who is the same bob that is in the first table with updated information to match the new Customer2 table layout.

I need to select all of these records into one result set based on a list of ID's, it is assumed that the ID's are unique. Therefore we can be sure that Bob in Customer is the same as Bob in Customer2. I am currently doing this with a union statement like this.

select * 
from (
select id, 
name, 
date_joined, 
'' as year_joined, 
'' as month_joined, 
'' as day_joined 
from customer 
union 
select id,
name,
'' as date_joined,
year_joined,
month_joined,
day_joined 
from customer2 ) 
as U where U.id in (list of ID's)

However this results in a table that looks like this

| id |    name | date_joined | year_joined | month_joined | day_joined |
|----|---------|-------------|-------------|--------------|------------|
|  4 |     Ken |             |        2013 |            1 |          1 |
|  5 | Lindsey |             |        2013 |            1 |          1 |
|  1 |     Bob |             |        2012 |            1 |          1 |
|  1 |     Bob |  2012-01-01 |             |              |            |
|  2 |    Jack |  2012-01-01 |             |              |            |
|  3 |    Jill |  2012-01-01 |             |              |            |

As we can see we have 'duplicate' records for Bob, which are not really duplicates. My question is this: When I encounter 'duplicates' like this, is there any way I can specify to only pick the record from the Customer2 table? I need this because I do not want duplicates and prefer to keep the record that matches the latest table layout.

Upvotes: 3

Answers (3)

Geoff

Reputation: 8850

Combine all the records using union all, adding a field to signal the priority for each source. Then use a windowing function to select the desired results based on that priority.

This should also scale better than the more traditional NOT IN or EXCEPT methods, if you have many rows over more input tables (think about 100K rows in each of 4 input tables, all of which need to be combined).

Example

If we have four source tables, each with a couple of key fields (K1, K2), and some optional fields (F1, F2), with table T1 being the highest priority to take from, and T4 being the lowest:

declare @T1 as table (K1 int, K2 int, F1 int, F2 int)
declare @T2 as table (K1 int, K2 int, F1 int, F2 int)
declare @T3 as table (K1 int, K2 int, F2 int)
declare @T4 as table (K1 int, K2 int, F1 int)

insert into @T1 values (1, 1, 1, 1)    -- We want this row
insert into @T1 values (2, 2, 2, 2)    -- We want this row
insert into @T2 values (1, 1, 2, 2)    -- Already in T1, should be eliminated
insert into @T3 values (1, 1, 2)       -- Already in T1, should be eliminated
insert into @T3 values (3, 3, 3)       -- We want this row
insert into @T4 values (1, 1, 2)       -- Already in T1, should be eliminated
insert into @T4 values (3, 3, 3)       -- Already in T3, should be eliminated
insert into @T4 values (4, 4, 4)       -- We want this row
;

-- Take all rows from all tables, and store the source priority of each:
with allRows as 
(
   select distinct K1, K2, F1,         F2,         1 as SRC from @T1
   union all
   select distinct K1, K2, F1,         F2,         2 as SRC from @T2
   union all
   select distinct K1, K2, null as F1, F2,         3 as SRC from @T3
   union all
   select distinct K1, K2, F1,         null as F2, 4 as SRC from @T4
)
select K1, K2, F1, F2, SRC
from
   (
      -- partition by key fields, order by our source priority:
      select K1, K2, F1, F2, SRC, row_number() over (partition by K1, K2 order by SRC) as RN
      from allRows
   ) as subQuery
where RN = 1

Results

`K1`	`K2`	`F1`	`F2`	`SRC`
1	1	1	1	1
2	2	2	2	1
3	3	NULL	3	3
4	4	4	NULL	4

Upvotes: 0

Abecee

Reputation: 2393

This should get you started:

SELECT
  id
  , name
  , date_joined
  , NULL year_joined
  , NULL month_joined
  , NULL day_joined
FROM Customer
WHERE id NOT IN (SELECT id FROM Customer2)
UNION ALL
SELECT
  id
  , name
  , NULL
  , year_joined
  , month_joined
  , day_joined
FROM Customer2
ORDER BY id
;

Alternatively, using just SET operations:

(SELECT
  id
  , name
  , date_joined
  , NULL year_joined
  , NULL month_joined
  , NULL day_joined
FROM Customer
EXCEPT
  (SELECT
    id
    , name
    , CONVERT(DATE,
            CAST(year_joined AS VARCHAR(4)) + '-'
              + CAST(month_joined AS VARCHAR(2)) + '-'
              + CAST(day_joined AS VARCHAR(2))
            , 102
           )
    , NULL
    , NULL
    , NULL
  FROM Customer2
  )
)
UNION ALL
SELECT
  id
  , name
  , NULL
  , year_joined
  , month_joined
  , day_joined
FROM Customer2
ORDER BY id
;

Either one takes from the Customer those, not found in Customer2, and combines them with all found in Customer2:

| id |    name | date_joined | year_joined | month_joined | day_joined |
|----|---------|-------------|-------------|--------------|------------|
|  1 |     Bob |      (null) |        2012 |            1 |          1 |
|  2 |    Jack |  2012-01-01 |      (null) |       (null) |     (null) |
|  3 |    Jill |  2012-01-01 |      (null) |       (null) |     (null) |
|  4 |     Ken |      (null) |        2013 |            1 |          1 |
|  5 | Lindsey |      (null) |        2013 |            1 |          1 |

See it in action: SQL Fiddle.

Myself though, I'd usually prefer a genuine date column over three columns with date particles...

Please comment, if and as this requires adjustment / further detail.

Upvotes: 1

Sam CD

Reputation: 2097

You would have to make the data match up exactly. Something like this could work:

select * 
from (
select id, 
name, 
datepart(year,date_joined) as year_joined, 
datepart(month,date_joined) as month_joined, 
datepart(day,date_joined) as day_joined 
from customer 
union 
select id,
name,
year_joined,
month_joined,
day_joined 
from customer2 ) 
as U where U.id in (list of ID's)

Upvotes: 0

SQL Union prefer records from one table?

Answers (3)

Example

Results

Related Questions