Reputation:

Skipping primary key conflicts with SQL copy

I have a large collection of raw data (around 300million rows) with about 10% replicated data. I need to get the data into a database. For the sake of performance I'm trying to use SQL copy. The problem being when I commit the data, primary key exceptions prevent any of the data from being processed. Can I change the behavior of primary keys such that conflicting data is simply ignored, or replaced? I don't really care either way - I just need one unique copy of each of the data.

Upvotes: 1

Answers (3)

EvilTeach

Reputation: 28837

Use a select statement to select exactly the data you want to insert, without the duplicates.

Use that as a basis of a CREATE TABLE XYZ AS SELECT * FROM (query-just-non-dupes)

You might check out ASKTOM ideas on how to select the non-duplicate rows

Upvotes: 0

Russell

Reputation:

That's what I was considering doing, but was worried about performance of getting rid of 30million randomly placed rows in a 300million entry database. The duplicate data also has a spatial relationship which is why I wanted to try to fix the problem while loading the data rather than after I have it all loaded.

Upvotes: 0

rjrapson

Reputation: 1997

I think your best bet would be to drop the constraint, load the data, then clean it up and reapply the constraint.

Upvotes: 2

Skipping primary key conflicts with SQL copy

Answers (3)

Related Questions