Luke De Feo
Luke De Feo

Reputation: 2165

Batch insert data counting new inserts

Suppose i have a simple schema with a composite pk with columns. e.g

pk1: string

pk2: int

date: Timestamp

I am reading data from somewhere else in batches of about 50 and would like to store this. The data source i am pulling from is a sliding window so I will be receiving data from the data source that i have already inserted so i cant just blindly insert otherwise i get a pk constraint violation.

I would like a reasonable way to insert the new items as a batch but also knowing how many new items i actually inserted for logging purposes.

Upvotes: 1

Views: 102

Answers (1)

Haleemur Ali
Haleemur Ali

Reputation: 28253

doing the insert

For postgresql version 9.5+, it is possible to use the following:

insert ... on conflict do nothing

example:

INSERT INTO users (id, user_name, email) 
VALUES (1, 'hal', '[email protected]') 
ON CONFLICT DO NOTHING

For recent earlier versions (since 9+, i think), it is possible to create a CTE from raw values & then insert from there:

WITH batch (id, user_name, email) AS (
VALUES
  (1, 'hal', '[email protected]'),
  (2, 'sal', '[email protected]')
)
INSERT INTO users (id, user_name, email) (
SELECT id, user_name, email
FROM batch
WHERE batch.id NOT IN (SELECT id FROM users)
)

or, instead of using a CTE, stage the values in a staging table that is truncated after every batch is processed.

Also, note that it might be necessary to explicitly cast strings to appropriate data types if the CTE method is used.

A third option would be to implement this using a stored procedure & trigger. This is more complicated than the other two, but would work with earlier versions of postgresql.

logging

Both of those methods should report the number of rows inserted, but the logging would have to be performed by the database client.

e.g. in Python, the library psycopg2 is used to interact with postgresql, and psycopg2 cursor objects have a property rowcount. I'm sure other well designed libraries written in other languages / frameworks will have implemented this same functionality somehow. Logging the # of rows inserted will have to be done from the part of the program interacting with the database.

However, if the logs of how many rows are inserted are required in the same database, then both the upsert & the logging may be performed via a single trigger + stored procedure.

Finally, as this is a special case of upsert, more information can be found by searching postgresql upsert on stack overflow or other sites. I found the following from the postgresql wiki very informative:

https://wiki.postgresql.org/wiki/UPSERT#PostgreSQL_.28today.29

Upvotes: 1

Related Questions