Reputation: 1536
I'm writing a piece of code (VB.NET) to cleanse a (quite big) table of data.
I am connecting to my SQL database, looping through the table, cleansing the data and adding the cleansed data in a different column.
As i'm currently doing an update to my database for each record in the same loop as where i am cleansing the data, i am wondering if there is a more efficient way of doing this, where i would cleanse the data and afterwards send all the updated records to the database in one go.
Simplified code:
'Connect
SQLConn.ConnectionString = strConnection
SQLConn.Open()
SQLCmd.Connection = SQLConn
SQLConn2.ConnectionString = strConnection
SQLConn2.Open()
SQLCmd2.Connection = SQLConn2
'Set query
strSQL = "SELECT Column1 FROM Table1"
SQLCmd.CommandText = strSQL
'Load Query
SQLdr = SQLCmd.ExecuteReader
'Start Cleansing
While SQLdr.Read
Cleansing()
'Add to database
strSQL2 = "UPDATE Table1 SET Clean_data = '" & strClean & "' WHERE Dirty_Data = '" & SQLdr(0).ToString & "'"
SQLCmd2.CommandText = strSQL2
SQLCmd2.ExecuteNonQuery()
End While
'Close Connections
SQLdr.Close()
SQLConn.Close()
SQLConn2.Close()
I'm guessing (from searching for a solution) that it is possible to do the update outside of my loop, but i can't seem to find how to do it specifically.
Many thanks!
Upvotes: 0
Views: 2422
Reputation: 1270793
Your code is taking a long time because the update
is doing a full table scan for every record. You can speed it up by adding an index on the column "Dirty Data".
Essentially, you are reading the data in the select statement. Cleaning one row, and then updating it. The preferred "set-based" approach is more like: Ideally, you would like to do:
update table1
set column1 = <fix the dirty data>
where column1 <is dirty>
And you have some options in SQL, in terms of replace()
and case
and like
(for instance) that can help with this process.
But you already have the cleaning code external to the database. For this, you want to create and open a cursor, process the record, and then write back. Cursors are relatively slow, compared to in-database operations. But, this is exactly the situation they were designed for -- external code to be applied to individual records.
Upvotes: 1