CalMlynarczyk
CalMlynarczyk

Reputation: 705

Process each row and copy it to new table using C#

I have an MSSQL 2008 table with a few million records. I need to iterate over each row, modify some of the data, and copy the updated record to a new table using a C# application that gets executed on a daily basis.

I have tried doing this using ADO.NET entities, but there are memory issues involved with this method, not to mention it is very slow. I have read up on bulk-copy libraries and SQL-only ways for copying one table to another, but none of them involve modifying records before copying them. I need to find a better way for performing this operation.

Upvotes: 3

Views: 683

Answers (4)

deroby
deroby

Reputation: 6002

As you mention memory issues I'm guessing you're trying to load the million rows into memory, process them and then write them back to the database. You can avoid this by 'streaming' the data instead of loading it entirely. The SqlDataReader will handle buffering for you so on the reading side you can do a simple WHILE loop that fetches rows one by one. The actual conversion you already have working it seems so all you need to do is take care of writing the results back into the database. IMHO the fastest way to do so is by storing a buffer of multiple results (start with 100, work up and see where the sweet spot is) in a data-table and then push that data-table into the database using the SqlBulkCopy class. Rinse & repeat.

PS: Sounds like a 'fun' problem. Do you have any sample data sitting somewhere to test this out ? 5 hours sounds like a LONG time for something that looks trivial at first, then again 20 million times virtually nothing still adds up. More specifically I wonder how 'large' the data is on the RTF side : are the values ca 2k on average or rather 200k? And what kind of hardware do you run this on ?

Upvotes: 3

Remus Rusanu
Remus Rusanu

Reputation: 294267

Use SSIS. Schedule a daily job that does your transformation and runs the SSIS package. This will take care of batching and memory consumption, and will offer a few fast connectors for the read and write of data. You can embed your custom C# code (the RTF stripping into pure text) as an SSIS component, see Developing Custom Objects for Integration Services.

Upvotes: 1

Daniel Szabo
Daniel Szabo

Reputation: 7281

Checking around the internet, it looks like Microsoft's official answer to converting rich to plain text is to load the data into a RichTextBox control and then pull it out with the RichTextBox.Text property. That sucks for a lot of reasons, but mostly because it means you're going to have to get your hands dirty. Your best bet is to write a small app that invokes the RichTextBox control and passes all of your data to/from the database (using the SqlDataReader should alleviate the memory issues you mentioned).

Just as a matter of process - I would suggest building an intermediary table that your "cleansed" data rows get dumped into before appending them to your production table. Once you get the stored proc figured out just right, you can create a trigger that automatically invokes your stored proc every time a record gets added to your dirty table. This will ultimately eliminate the need to run your program every day to move records, as the trigger will make sure it happens "on the fly".

Edit - one last thought

It occurred to me that you might not be comfortable writing stored procedures and triggers, which is ok. A more "programmatic" solution would be to kick all of the files in your dirty table out to a delimited text file, which can easily be downloaded and parsed. Once you have the text file, you could manipulate it with your app (read it, cleanse it, create a cleansed file..what have you) and then upload for reading back into your database. Depending on your comfort/background/skill level, this might actually be the better solution to get the job done.

Hope this helps!

Upvotes: 1

BluesRockAddict
BluesRockAddict

Reputation: 15683

The fastest performing option would be to re-write your C# application logic into a CLR stored procedure so that all processing takes place on the server.

Upvotes: 3

Related Questions