Reputation: 21
I want to split a table into two tables (or more, but let's say two).
table_original
id column1 column2
1 1 2
2 1 3
3 1 4
4 1 4
5 1 5
We can also assume that id
is a unique identifier. Now I split this table into two, by using a CREATE TABLE table1 AS SELECT * FROM table_original WHERE column2 <= 4
and CREATE TABLE table2 AS SELECT * FROM table_original WHERE column2 >= 4
. Now I have these two tables:
table1
id column1 column2
1 1 2
2 1 3
3 1 4
4 1 4
table2
id column1 column2
3 1 4
4 1 4
5 1 5
How to get the same results from those two tables that I can get from the original table? If I run a query SELECT * FROM table1 UNION SELECT * FROM table2
it will be the same as SELECT * FROM table_original
because of the unique id
value, however if I run a query SELECT column1, column2 FROM table1 UNION SELECT column1, column2 FROM table2
it returns:
column1, column2
1 2
1 3
1 4
1 5
which is not the same as SELECT column1, column2 FROM table_original
, which returns:
column1, column2
1 2
1 3
1 4
1 4
1 5
Duplicates from the same table are removed. However, if I wanted to let's say do a count on duplicates, the results will be different, which is bad. So is there a way to do a UNION
type operation but keep duplicates that are found in the same table?
Upvotes: 1
Views: 768
Reputation: 656301
The UNION
on whole rows in your solution will be painfully expensive for big tables (and wide rows). And it fails outright with any column type that doesn't support the equality operator (like json
). See:
This query is substantially faster, making use of the unique index on table1(id)
. (Create that index if you don't have it!)
SELECT column1, column2
FROM table1 -- bigger table first to micro-optimize some more
UNION ALL
SELECT column1, column2
FROM table2 t2
WHERE NOT EXISTS (SELECT FROM table1 WHERE id = t2.id)
See:
About UNION ALL
(as opposed to just UNION
):
The question remains: Why keeps completely duplicate rows in multiple tables?
Upvotes: 2
Reputation: 21
I've figured out the answer.
To keep the duplicates found in the same table, but eliminate everything else, I used a query SELECT column1, column2 FROM (SELECT * FROM table1 UNION SELECT * FROM table2) AS t;
This way the UNION uses the unique id
values to eliminate real duplicates, and after that I just filter the result to get the columns I need.
Upvotes: 0
Reputation: 24568
not sure what are you trying to achieve but you need to use union all
:
SELECT column1, column2 FROM table1
UNION ALL
SELECT column1, column2 FROM table2
union all keeps the duplicates
Upvotes: 1