Kevin
Kevin

Reputation: 1536

Update table while reading

I'm writing a piece of code (VB.NET) to cleanse a (quite big) table of data.

I am connecting to my SQL database, looping through the table, cleansing the data and adding the cleansed data in a different column.

As i'm currently doing an update to my database for each record in the same loop as where i am cleansing the data, i am wondering if there is a more efficient way of doing this, where i would cleanse the data and afterwards send all the updated records to the database in one go.

Simplified code:

'Connect
SQLConn.ConnectionString = strConnection
SQLConn.Open()
SQLCmd.Connection = SQLConn
SQLConn2.ConnectionString = strConnection
SQLConn2.Open()
SQLCmd2.Connection = SQLConn2

'Set query
strSQL = "SELECT Column1 FROM Table1"
SQLCmd.CommandText = strSQL

'Load Query
SQLdr = SQLCmd.ExecuteReader

'Start Cleansing
While SQLdr.Read
    Cleansing()

'Add to database
    strSQL2 = "UPDATE Table1 SET Clean_data = '" & strClean & "' WHERE Dirty_Data = '" & SQLdr(0).ToString & "'"
    SQLCmd2.CommandText = strSQL2
    SQLCmd2.ExecuteNonQuery() 
End While

'Close Connections
SQLdr.Close()
SQLConn.Close()
SQLConn2.Close()

I'm guessing (from searching for a solution) that it is possible to do the update outside of my loop, but i can't seem to find how to do it specifically.

Many thanks!

Upvotes: 0

Views: 2422

Answers (1)

Gordon Linoff
Gordon Linoff

Reputation: 1270773

Your code is taking a long time because the update is doing a full table scan for every record. You can speed it up by adding an index on the column "Dirty Data".

Essentially, you are reading the data in the select statement. Cleaning one row, and then updating it. The preferred "set-based" approach is more like: Ideally, you would like to do:

update table1
    set column1 = <fix the dirty data>
    where column1 <is dirty>

And you have some options in SQL, in terms of replace() and case and like (for instance) that can help with this process.

But you already have the cleaning code external to the database. For this, you want to create and open a cursor, process the record, and then write back. Cursors are relatively slow, compared to in-database operations. But, this is exactly the situation they were designed for -- external code to be applied to individual records.

Upvotes: 1

Related Questions