Reputation: 452
I am new to postgresql (and databases in general) and was hoping to get some pointers on improving the efficiency of the following statement.
I am inserting data from one table to another, and do not want to insert duplicate values. I have a rid (unique identifier in each table) that are indexed and are Primary Keys.
I am currently using the following statement:
INSERT INTO table1 SELECT * FROM table2 WHERE rid NOT IN (SELECT rid FROM table1).
As of now the table one is 200,000 records, table2 is 20,000 records. Table1 is going to keep growing (probably to around 2,000,000) and table2 will stay around 20,000 records. As of now the statement takes about 15 minutes to run. I am concerned that as Table1 grows this is going to take way to long. Any suggestions?
Upvotes: 2
Views: 1653
Reputation: 125214
insert into table1
select t2.*
from
table2 t2
left join
table1 t1 on t1.rid = t2.rid
where t1.rid is null
Upvotes: 1
Reputation: 1904
This should be more efficient than your current query:
INSERT INTO table1
SELECT *
FROM table2
WHERE NOT EXISTS (
SELECT 1 FROM table1 WHERE table1.rid = table2.rid
);
Upvotes: 3