John Trevor
John Trevor

Reputation: 21

Get unique rows from two tables, but keep duplicates from the same table

I want to split a table into two tables (or more, but let's say two).

table_original
id column1 column2
 1   1        2
 2   1        3
 3   1        4
 4   1        4
 5   1        5

We can also assume that id is a unique identifier. Now I split this table into two, by using a CREATE TABLE table1 AS SELECT * FROM table_original WHERE column2 <= 4 and CREATE TABLE table2 AS SELECT * FROM table_original WHERE column2 >= 4. Now I have these two tables:

table1
id column1 column2
 1   1        2
 2   1        3
 3   1        4
 4   1        4
table2
id column1 column2
 3   1        4
 4   1        4
 5   1        5

How to get the same results from those two tables that I can get from the original table? If I run a query SELECT * FROM table1 UNION SELECT * FROM table2 it will be the same as SELECT * FROM table_original because of the unique id value, however if I run a query SELECT column1, column2 FROM table1 UNION SELECT column1, column2 FROM table2 it returns:

column1, column2
   1        2
   1        3
   1        4
   1        5

which is not the same as SELECT column1, column2 FROM table_original, which returns:

column1, column2
   1        2
   1        3
   1        4
   1        4
   1        5

Duplicates from the same table are removed. However, if I wanted to let's say do a count on duplicates, the results will be different, which is bad. So is there a way to do a UNION type operation but keep duplicates that are found in the same table?

Upvotes: 1

Views: 768

Answers (3)

Erwin Brandstetter
Erwin Brandstetter

Reputation: 656301

The UNION on whole rows in your solution will be painfully expensive for big tables (and wide rows). And it fails outright with any column type that doesn't support the equality operator (like json). See:

This query is substantially faster, making use of the unique index on table1(id). (Create that index if you don't have it!)

SELECT column1, column2
FROM   table1  -- bigger table first to micro-optimize some more

UNION ALL 
SELECT column1, column2
FROM   table2 t2
WHERE  NOT EXISTS (SELECT FROM table1 WHERE id = t2.id)

See:

About UNION ALL (as opposed to just UNION):

The question remains: Why keeps completely duplicate rows in multiple tables?

Upvotes: 2

John Trevor
John Trevor

Reputation: 21

I've figured out the answer.

To keep the duplicates found in the same table, but eliminate everything else, I used a query SELECT column1, column2 FROM (SELECT * FROM table1 UNION SELECT * FROM table2) AS t;

This way the UNION uses the unique id values to eliminate real duplicates, and after that I just filter the result to get the columns I need.

Upvotes: 0

eshirvana
eshirvana

Reputation: 24568

not sure what are you trying to achieve but you need to use union all:

SELECT column1, column2 FROM table1 
UNION ALL 
SELECT column1, column2 FROM table2

union all keeps the duplicates

Upvotes: 1

Related Questions