Display Name
Display Name

Reputation: 318

Fastest way to compare CSV file to database in c#

I am writing an internal application and one of the functions will be importing data from a remote system. The data from the remote system comes over as a CSV file. I need to compare the data in my system with that of the CSV file.

I need to apply any changes to my system (Adds and Changes). I need to track each field that is changed.

My database is normalized so I'm dealing with about 10 tables to correspond with the data in the CSV file. What is the best way to implement this? Each CSV file has about 500,000 records that are processed daily. I started by querying row by row from my SQL database using a lookup ID then using c# do do a field by field compare and updating or inserting as necessary; however, this takes way too long.

Any suggestions?

Upvotes: 1

Views: 3608

Answers (3)

peterm
peterm

Reputation: 92785

You can do following:

  • Load cvs file into staging table in your db;
  • Perform validation and clean-up routines on it (if necessary)
  • Perform your comparisons and updates on your live data
  • Wipe out all data from staging table

Using that approach you can implement almost all clean-up, validation, and update logic using your RDBMS functionality.

If your RDBMS is SQL Server you can leverage SQL Server Integration Services.

Upvotes: 3

Eric J.
Eric J.

Reputation: 150108

If you have anything that serves as a unique key, you can do the following:

  • Create a new table Hashes that contains a unique key and a hash of all fields associated with that key (do not use .NET's object.GetHashCode(), as the value returned does change from time to time by design. I personally use Google's CityHash which I ported to C#).
  • When you get a new CSV file, compute the hash value for each key
  • Check the Hashes table for each row in the CSV file.
  • If there is no entry for the unique key, create one and insert the row.
  • If there is an entry, see if the hash has changed.
  • If it has, update the hash in the Hashes table and update data.

Upvotes: 2

Dale M
Dale M

Reputation: 2473

Expanding on the first comment to your question.

Create an appropriately indexed table that matches the format of your csv file and dump the data straight into it.

Have a stored procedure with appropriate queries to update/delete/insert to the active tables.

Get rid of the temporary table.

Upvotes: 0

Related Questions