rossi
rossi

Reputation: 314

unique constraint violation on a row causes entire pq.CopyIn postgresql import to fail

I'm trying to use pq.CopyIn to do bulk imports as described here:

https://godoc.org/github.com/lib/pq

The import is much faster than other methods I've tried but I am finding that a unique constraint violation in just one record will cause the entire import to fail.

Is there any way to set ON CONFLICT DO NOTHING using pq.CopyIn.

Here is a copy of my table structure

CREATE TABLE test (
    id serial PRIMARY KEY,
    unique_token VARCHAR ( 10 ) UNIQUE NOT NULL,
    frequency INT DEFAULT 0
);

I tried using @mkopriva answer below but I'm getting Error: pq: null value in column "id" violates not-null constraint

Code sample below

tx, _ := db.Begin()

_, err = tx.Exec(`CREATE TEMP TABLE token_temp ON COMMIT DROP AS 
   SELECT id, unique_token FROM test WITH NO DATA`)
if err != nil {
    return err
}

stmt, err := tx.Prepare(pq.CopyIn("token_temp", "unique_token"))
if err != nil {
    fmt.Println("error here")
    return err
}

for _, token := range tokenList {
    _, err = stmt.Exec(token)

    if err != nil {
        return err
    }

}

_, err = stmt.Exec()
if err != nil {
    log.Fatal(err)
}

err = stmt.Close()
if err != nil {
    log.Fatal(err)
}

_, err = tx.Exec(`INSERT INTO test SELECT id, unique_token FROM 
  token_temp   ON CONFLICT(unique_token) DO UPDATE SET frequency= 
   test.frequency + 1 `)
if err != nil {
    fmt.Println("Error")
    return err
}

err = tx.Commit()
if err != nil {
    log.Fatal(err)
}

Upvotes: 1

Views: 433

Answers (1)

mkopriva
mkopriva

Reputation: 38313

pq.CopyIn internally uses COPY FROM which has no support for the ON CONFLICT clause.

What you can do, however, is to create a temporary table that has no constraints, copy the data into that temporary table, and then do an INSERT into the target table, with your ON CONFLICT clause, using the temporary table as the source of the data to be inserted.

An example should make this more clear, say you have a users table that looks like this:

CREATE TABLE users (
    id serial PRIMARY KEY
    , name text
    , email text UNIQUE
);

And say you have a slice of users like this:

var users = []User{
    {Name: "John Doe", Email: "[email protected]"},
    {Name: "Joe Blow", Email: "[email protected]"},
    {Name: "Jane Doe", Email: "[email protected]"}, // duplicate email!
    {Name: "Foo Bar", Email: "[email protected]"},
}

With that you can do the following:

_, err = txn.Exec(`
CREATE TEMP TABLE users_temp
ON COMMIT DROP
AS SELECT * FROM users
WITH NO DATA`)
if err != nil {
    panic(err)
}

stmt, err := txn.Prepare(pq.CopyIn("users_temp", "name", "email"))
if err != nil {
    panic(err)
}

for _, u := range users {
    if _, err := stmt.Exec(u.Name, u.Email); err != nil {
        panic(err)
    }
}
if _, err := stmt.Exec(); err != nil {
    panic(err)
}
if err := stmt.Close(); err != nil {
    panic(err)
}

_, err = txn.Exec(`
INSERT INTO users (name, email)
SELECT name, email FROM users_temp
ON CONFLICT DO NOTHING`)
if err != nil {
    panic(err)
}

if err := txn.Commit(); err != nil {
    panic(err)
}

After you run the above you can do SELECT * FROM users; and you'll get this:

 id |   name   |       email
----+----------+-------------------
  1 | John Doe | [email protected]
  2 | Joe Blow | [email protected]
  4 | Foo Bar  | [email protected]
(3 rows)

For you specific example and requirement you can do something like this in the INSERT ... SELECT ... query:

_, err = txn.Exec(`
INSERT INTO test (unique_token, frequency)
SELECT unique_token, COUNT(*) FROM token_temp
GROUP BY unique_token`)
if err != nil {
    panic(err)
}

Upvotes: 6

Related Questions