FMFF
FMFF

Reputation: 1718

Iterating through two identical data sources

I have data with the same schema in a pipe delimited text file and in a database table, including the primary key column.

I have to check if each row in the file is present in the table, if not generate an INSERT statement for that row.

The table has 30 columns, but here I've simplified for this example:

ID       Name    Address1    Address2    City    State    Zip

ID is the running identity column; so if a particular ID value from the file is found in the table, there should be no insert statement generated for that.

Here's my attempt, which doesn't feel correct:

 foreach (var item in RecipientsInFile)
        {
            if (!RecipientsInDB.Any(u => u.ID == item.ID ))
            {
               Console.WriteLine(GetInsertSql(item));
            }
        }
        Console.ReadLine();

EDIT: Sorry, I missed the asking the actual question; how to do this? Thank you very much for all the help.

EDIT: The table has a million plus rows, while the file has 50K rows. This a one time thing, not a permanent project.

Upvotes: 1

Views: 96

Answers (3)

p.campbell
p.campbell

Reputation: 100607

Try comparing the ID lists using .Except()

List<int> dbIDs = Recipients.Select(x=>x.ID).ToList();
List<int> fileIDs = RecipientsFile.Select(x=>x.ID).ToList();
List<int> toBeInserted = fileIDs.Except(dbIDs).ToList();

toBeInserted.ForEach(x=>GetInsertSqlStatementForID(x));

For the pedantic and trollish among us in the comments, please remember the above code (like any source code you find on the interwebs) shouldn't be copy/pasted into your production code. Try this refactoring:

foreach (var item in RecipientsFile.Select(x=>x.ID)
                                   .Except(DatabaseRecipients.Select(x=>x.ID)))
{
   GetInsertSqlStatementForID(item);
}

Upvotes: 2

Joe Enos
Joe Enos

Reputation: 40413

Lots of ways of accomplishing this. Yours is one way.

Another would be to always generate SQL, but generate it in the following manner:

if not exists (select 1 from Recipients where ID == 1234)
    insert Recipients (...) values (...)
if not exists (select 1 from Recipients where ID == 1235)
    insert Recipients (...) values (...)

Another would be to retrieve the entire contents of the database into memory beforehand, loading the database IDs into a HashSet, then only checking that HashSet to see if it exists - would take a little longer to get started, but would be faster for each record.

Any of these three techniques would work - it all depends on how big your database table is, and how big your file is. If they're both relatively small (maybe 10,000 records or so), then any of these should work fine.

EDIT

And there's always option D: Insert all records from the file into a temporary table (could be a real table or a SQL temp table, doesn't really matter) in the database, then use SQL to join the two tables together and retrieve the differences (using not exists or in or whatever technique you want), and insert the missing records that way.

Upvotes: 0

Xavier Delamotte
Xavier Delamotte

Reputation: 3599

I would add all the RecipientsInDB Ids in a HashSet and then test if the set contains the item Id.

 var recipientsInDBIds = new Hashset(RecipientsInDB.Select(u => u.ID));
 foreach (var item in RecipientsInFile)
    {
        if (!recipientsInDBIds.Contains(item.ID ))
        {
           Console.WriteLine(GetInsertSql(item));
        }
    }
    Console.ReadLine();

Upvotes: 2

Related Questions