Steve W
Steve W

Reputation: 1128

Check if data exists (avoid duplication) in database before running some SQL Server code

I'm working on automating the import of CSV data to a database. At this time I have code that results in a table called #temp1 which looks like this:

CompetitionID   DateKickOff                TimeKickOff         TeamIDHome  TeamIDAway   ScoreHome   ScoreAway
--------------- -------------------------- ------------------- ----------- ------------ ----------- ------------
2               2013-03-08 00:00:00.000    14:02:00.0000000    21          43           0           4

At this point I need to check whether an identical combination of DateKickOff and TeamIDHome values exist in the database all ready, so as to avoid importing duplicate data. If this combination all ready exists on a row of table Match then I need to effectively skip the following code:

INSERT INTO Match
SELECT *
FROM #temp1

--Add MatchID column to Temp Data file and fill it with the most recent match ID
ALTER TABLE #CSVTest_Data
ADD MatchID INT

UPDATE #CSVTest_Data
SET MatchID = (SELECT TOP 1 MatchID
               FROM Match
               ORDER BY MatchID DESC)

INSERT INTO Data (MatchID,
                  OddsFirstTimeTaken, 
                  OddsLastTimeTaken, 
                  MarketName, 
                  Outcome, 
                  Odds, 
                  NumberOfBets, 
                  VolumeMatched, 
                  InPlay)
SELECT MatchID,
       FirstTimeTaken, 
       LatestTimeTaken, 
       Market, 
       Outcome, 
       Odds, 
       NumberOfBets, 
       VolumeMatched, 
       InPlay
FROM #CSVTest_Data

Obviously, if the data is not a duplicate entry then the code above needs to be run. I would really appreciate some help with this.

EDIT: Just to clarify, the comparison of data needs to take place before the 'INSERT INTO Match' code occurs. If the data is not duplicate, SQL Server will increment the Primary Key of MatchID in the Match table. I then get this new MatchID value and write it to my second temporary table before writing to my 'Data' table. If no new entry was added to the 'Match' table, then no data must be written to the 'Data' table.

Upvotes: 0

Views: 1098

Answers (3)

Steve W
Steve W

Reputation: 1128

This works by carrying out a separate check for duplicate data for each table:

INSERT INTO  Match    
SELECT *
FROM #temp1
EXCEPT
SELECT CompetitionID, DateKickOff, TimeKickOff, TeamIDHome, TeamIDAway, ScoreHome, ScoreAway
FROM Match

DELETE #CSVTest_Data
  FROM #CSVTest_Data d
WHERE EXISTS( SELECT * from Data d2 WHERE
    d.FirstTimeTaken = d2.OddsFirstTimeTaken AND
    d.LatestTimeTaken = d2.OddsLastTimeTaken AND
    d.Market = d2.MarketName AND
    d.Outcome = d2.Outcome AND
    d.Odds = d2.Odds AND
    d.NumberOfBets = d2.NumberOfBets AND
    d.VolumeMatched = d2.VolumeMatched AND
    d.InPlay = d2.InPlay)

--Add MatchID column to Temp Data file and fill it with the most recent match ID
ALTER TABLE #CSVTest_Data ADD MatchID INT
update #CSVTest_Data
     Set MatchID = (SELECT TOP 1 MatchID FROM BetfairFootballDB..Match
                   ORDER BY MatchID DESC)

INSERT INTO  BetfairFootballDB..Data (MatchID, OddsFirstTimeTaken, OddsLastTimeTaken, MarketName, Outcome, Odds, NumberOfBets, VolumeMatched, InPlay)
SELECT MatchID, FirstTimeTaken, LatestTimeTaken, Market, Outcome, Odds, NumberOfBets, VolumeMatched, InPlay
FROM #CSVTest_Data

Upvotes: 0

Moslem Ben Dhaou
Moslem Ben Dhaou

Reputation: 7005

You can use the EXCEPT keyword:

WITH NewData AS (
    SELECT FirstTimeTaken
         , LatestTimeTaken
         , Market
         , Outcome
         , Odds
         , NumberOfBets
         , VolumeMatched
         , InPlay
    FROM #CSVTest_Data -- Coming data

    EXCEPT --Minus

    SELECT FirstTimeTaken
         , LatestTimeTaken
         , Market
         , Outcome
         , Odds
         , NumberOfBets
         , VolumeMatched
         , InPlay
    FROM Data --Existing Data
)

INSERT INTO Data (OddsFirstTimeTaken, OddsLastTimeTaken, MarketName, Outcome, Odds, NumberOfBets, VolumeMatched, InPlay)
SELECT FirstTimeTaken, LatestTimeTaken, Market, Outcome, Odds, NumberOfBets, VolumeMatched, InPlay
FROM NewData --Insert New Data only

EDIT :

If you have primary identity keys, you don't need to use them at all even in insert statements. SQL Server will figure out how to handle them.

Upvotes: 3

AK47
AK47

Reputation: 3797

Try this,

INSERT INTO Data (MatchID,
                  OddsFirstTimeTaken, 
                  OddsLastTimeTaken, 
                  MarketName, 
                  Outcome, 
                  Odds, 
                  NumberOfBets, 
                  VolumeMatched, 
                  InPlay)
SELECT t.MatchID,
       t.FirstTimeTaken, 
       t.LatestTimeTaken, 
       t.Market, 
       t.Outcome, 
       t.Odds, 
       t.NumberOfBets, 
       t.VolumeMatched, 
       t.InPlay
FROM #CSVTest_Data as t
left join Data as d on t.DateKickOff = d.DateKickOff
            and t.TeamIDHome = d.TeamIDHome
where d.DateKickOff is null and d.TeamIDHome is null

Upvotes: 0

Related Questions