blue piranha
blue piranha

Reputation: 3874

Avoiding duplicate record insertion into SQL table

I have a windows service which basically watches a folder for any CSV file. Each record in the CSV file is inserted into a SQL table. If the same CSV file is put in that folder, it can lead to duplicate record entries in the table. How can I avoid duplicate insertions into the SQL table?

Upvotes: 0

Views: 1062

Answers (3)

Megadest
Megadest

Reputation: 634

Problems with the answers using WHERE NOT EXISTS are:

  1. performance -- row-by-row processing requires, potentially, a very large number of table scans against table
  2. NULL handling -- for every column where there might be NULLs you will have to write the matching condition in a more complicated way, like (a = @a OR (a IS NULL AND @a IS NULL)). Repeat that for 10 columns and viola - you hate SQL :)

A better answer would take into account the great SET processing capabilities that relational databases provide (in short -- never use row-by-row processing in SQL if you can avoid it. If you can't -- think again and avoid it anyway).

So for the answer:

  1. load (all) data into a temporary table (or a staging table that can be safely truncated before load)
  2. run the insert in a "set"-way:
INSERT INTO table (<columns>)
    select <columns> from #temptab
    EXCEPT
    select <columns> from table

Keep in mind that the EXCEPT is safely dealing with NULLs for every kind of column ;) as well as choosing a high-performance join type for matching (hash, loop, merge join) depending on the available indexes and table statistics.

Upvotes: 0

DpakG
DpakG

Reputation: 66

The accepted answer has a syntax error and is not compatible with relational databases like MySQL.

Specifically, the following is not compatible with most databases:

values(...) where not exists

While the following is generic SQL, and is compatible with all databases:

select ... where not exists

Given that, if you want to insert a single record into a table after checking if it already exists, you can do a simple select with a where not exists clause as part of your insert statement, like this:

INSERT 
INTO      table_name (
            primay_col, 
            col_1, 
            col_2
          )
SELECT    1234,
          'val_1',
          'val_2' 
WHERE     NOT EXISTS (
  SELECT  1 
  FROM    table_name 
  WHERE   primary_col=1234
);

Simply pass all values with the select keyword, and put the primary or unique key condition in the where clause.

Upvotes: 1

Ruslan Osipov
Ruslan Osipov

Reputation: 5843

Try INSERT WHERE NOT EXISTS, where a, b and c are relevant columns, @a, @b and @c are relevant values.

INSERT INTO table
(
    a,
    b,
    c
)
VALUES
(
    @a,
    @b,
    @c
)
WHERE NOT EXISTS
(
    SELECT 0 FROM table WHERE a = @a, b = @b, c = @c
)

Upvotes: 1

Related Questions