Reputation: 2165
Suppose i have a simple schema with a composite pk with columns. e.g
pk1: string
pk2: int
date: Timestamp
I am reading data from somewhere else in batches of about 50 and would like to store this. The data source i am pulling from is a sliding window so I will be receiving data from the data source that i have already inserted so i cant just blindly insert otherwise i get a pk constraint violation.
I would like a reasonable way to insert the new items as a batch but also knowing how many new items i actually inserted for logging purposes.
Upvotes: 1
Views: 102
Reputation: 28253
doing the insert
For postgresql version 9.5+, it is possible to use the following:
insert ... on conflict do nothing
example:
INSERT INTO users (id, user_name, email)
VALUES (1, 'hal', '[email protected]')
ON CONFLICT DO NOTHING
For recent earlier versions (since 9+, i think), it is possible to create a CTE from raw values & then insert from there:
WITH batch (id, user_name, email) AS (
VALUES
(1, 'hal', '[email protected]'),
(2, 'sal', '[email protected]')
)
INSERT INTO users (id, user_name, email) (
SELECT id, user_name, email
FROM batch
WHERE batch.id NOT IN (SELECT id FROM users)
)
or, instead of using a CTE, stage the values in a staging table that is truncated after every batch is processed.
Also, note that it might be necessary to explicitly cast strings to appropriate data types if the CTE method is used.
A third option would be to implement this using a stored procedure & trigger. This is more complicated than the other two, but would work with earlier versions of postgresql.
logging
Both of those methods should report the number of rows inserted, but the logging would have to be performed by the database client.
e.g. in Python
, the library psycopg2
is used to interact with postgresql, and psycopg2 cursor objects have a property rowcount
. I'm sure other well designed libraries written in other languages / frameworks will have implemented this same functionality somehow. Logging the # of rows inserted will have to be done from the part of the program interacting with the database.
However, if the logs of how many rows are inserted are required in the same database, then both the upsert & the logging may be performed via a single trigger + stored procedure.
Finally, as this is a special case of upsert
, more information can be found by searching postgresql upsert
on stack overflow or other sites. I found the following from the postgresql wiki very informative:
https://wiki.postgresql.org/wiki/UPSERT#PostgreSQL_.28today.29
Upvotes: 1