Martin
Martin

Reputation:

My SQL insert/update statement is too inefficient

I'm trying to write code for a batch import of lots of rows into the database.

Currently I bulk copy the raw data (from a .csv file) into a staging table, so that it's all at the database side. That leaves me with a staging table full of rows that identify 'contacts'. These now need to be moved into other tables of the database.

Next I copy over the rows from the staging table that I don't already have in the contacts table, and for the ones I do already have, I need to update the column named "GroupToBeAssignedTo", indicating a later operation I will perform.

I have a feeling I'm going about this wrong. The query isn't efficient and I'm looking for advice of how I could do this better.

update [t1]
set [t1].GroupToBeAssignedTo = [t2].GroupToBeAssignedTo from Contacts [t1]
inner join ContactImportStaging [t2] on [t1].UserID = [t2].UserID AND [t1].EmailAddress = [t2].EmailAddress AND [t2].GUID = @GUID 
where not exists 
(
   select GroupID, ContactID from ContactGroupMapping 
   where GroupID = [t2].GroupToBeAssignedTo AND ContactID = [t1].ID
)

Might it be better to just import all the rows without checking for duplicates first and then 'clean' the data afterwards? Looking for suggestions of where I'm going wrong. Thanks.

EDIT: To clarify, the question is regarding MS SQL.

Upvotes: 2

Views: 223

Answers (2)

Jeremy Smyth
Jeremy Smyth

Reputation: 23493

This answer is slightly "I wouldn't start from here", but it's the way I'd do it ;)

If you've got the Standard or Enterprise editions of MS SQL Server 2005, and you have access to SQL Server Integration Services, this kind of thing is a doddle to do with a Data Flow.

  • Create a data source linked to the CSV file (it's faster if it's sorted by some field)
  • ...and another to your existing contacts table (using ORDER BY to sort it by the same field)
  • Do a Merge Join on their common field -- you'll need to use a Sort transformation if either the two sources aren't already sorted
  • Do a Conditional split to focus only on rows that aren't already in your table (i.e. a table-unique field is "null", i.e. the merge join didn't actually merge for that row)
  • Use an OLEDB destination to input to the table.

Probably more individual steps than a single insert-with-select statement, but it'll save your staging, and it's pretty intuitive to follow. Plus, you're probably already licenced to use it, and it's pretty easy :)

Upvotes: 1

Quassnoi
Quassnoi

Reputation: 425341

Next I copy over the rows from the staging table that I don't already have in the contacts table

It seems that implies that ContactGroupMapping does not have records matching Contacts.id, in which case you can just omit the EXISTS:

UPDATE  [t1]
SET     [t1].GroupToBeAssignedTo = [t2].GroupToBeAssignedTo
FROM    Contacts [t1]
INNER JOIN
        ContactImportStaging [t2]
ON      [t1].UserID = [t2].UserID
        AND [t1].EmailAddress = [t2].EmailAddress
        AND [t2].GUID = @GUID 

Or I am missing something?

Upvotes: 0

Related Questions