Reputation: 127
I need to process hundreds of thousands of records with VB6 and an MS Access database. I iterate through the recordset and edit each record. However it takes a lot of time to do so. Creating a database with the same amount of records using the Addnew and Update methods works much faster.
I will greatly appreciate if anyone shows me any code sample or just a strategy.
Here is the code
Data1(1).RecordSource = "Select * from TABLE order by Field_A ASC"
Data1(1).Refresh
If Data1(1).Recordset.RecordCount > 0 Then
Data1(1).Recordset.MoveFirst
Do
Data1(1).Recordset.Edit
Data1(1).Recordset.Fields("FIELD") = Sort_Value
Data1(1).Recordset.Update
Data1(1).Recordset.MoveNext
Loop Until Data1(1).Recordset.EOF = True
End If
It is really quite quite simple. The real thing is, I forgot to mention, that tha HDD of the computer constantly reds/writes. This is actually the problem. With such a heavy load there is no way not to affect performance.
I first thought that the recordset generated by the query, keep in mind that we hane 1-2 Million records, causes this problem. I guess it resides on some temporary place on the hard drive and on in RAM. And so executin .Edit and .Update might be a problem in first positioning the cursor at the right place and then writing.
Don't know for sure. Probably there would be an expert to show me a way out.
Btw. I also tried with replacing the Loop Until Data1(1).Recordset.EOF = True statement with a fixed length cycle, because I also read, that this checking for Recordset.EOF also slows down performance.
Thank you in advance!
Upvotes: 6
Views: 7236
Reputation: 935
To enhance performance you can use UpdateBatch
method of ADODB Object. But to use this feature, it requires:
adOpenStatic
cursorType andadLockBatchOptimistic
LockType on the recordset object.
Additionally you can also use adUseClient
CursorLocation to have load on the client instead of the server during the operation.
To go further, refrain from using rec.EOF
test. You should instead use for loop starting from 1 to rec.RecordCount
Worth mentioning:
It is unreliable to readrec.RecordCount
until ADODB has traversed all records. So do aMoveLast
andMoveFirst
to ensure correct recordcount.
Use following code as a hint:
set con = Server.CreateObject("ADODB.Connection")
con.Provider = "Microsoft.Jet.OLEDB.4.0"
con.Open(Server.Mappath("MyShopDB.mdb"))
set rec = Server.CreateObject("ADODB.recordset")
sql = "SELECT * FROM Employees"
rec.CursorLocation = adUseClient
rec.CursorType = adOpenStatic
rec.LockType = adLockBatchOptimistic
rec.Open sql, con
if not rec.EOF then ' rescue no records situation
rec.moveLast ' let it see all records
rec.moveFirst
end if
cnt = rec.RecordCount ' avoid reading each time in loop test
if cnt > 0 then
for i = 1 to cnt
rec.Fields("FIELD").value = Sort_Value
'...
'...
'...
rec.MoveNext
next i
rec.UpdateBatch
end if
rec.Close
con.Close
It has been quite longer than 3 years when I switched from VB
to PHP
. Might be missing some tracks here. Please note:
I have not executed this code yet. It may contain minor issues, however this should be sufficient for indication purpose.
You can also try to split batch into fragments to see effects on performance.
Upvotes: 0
Reputation: 1055
While it may be necessary in some cases, iterating through recordsets in order to update a field should be avoided.
The sensible thing to do, which is much more effective, would be to write an SQL update query.
If your table is so big, you must be careful with the election of your indexes, specially the primary key.
Then, you can divide the data based on your PK, and update all the records in the first set, then in the second, third...
UPDATE super_big_table
SET Field_A = some_function_to_make_it_sort_value
WHERE myPrimaryKey BETWEEN ( left_boundary AND right_boundary )
You repeat this (by code) for all the divisions you have made in your table.
Now, the question is- can you think of an Access function which creates the required Sort_value?
Note that if Field_A is your primary key, then you shouldn't change it. Otherwise, all your table would be rearranged each time you updated a few records, which would be a lot of work for your HD / processor. In this case, you should have a different PK, and create an index on Field_A, not PK.
Upvotes: 1
Reputation: 4069
My only suggestion, which may not work in your case, is to do mass updates using an update query.
Three cases where this could work:
If Sort_Value can be calculated from the other fields, that's a simple UPDATE query, but I'm sure you would have already seen that.
If Sort_Value can be calculated from other records (like previous record), then you can probably write a more complex UPDATE query (I've seen some pretty complex queries posted here).
Lastly, if the same Sort_Value is being applied to a lot of records, then you can issue an UPDATE query based on those records. So, if Sort_Value could be 10 different values, then all your updates would be done in 10 UPDATE queries.
If you tell us where you get the Sort_Value, we might be able to help you further.
Here's some things that DO NOT work to speed up the edit/update commands, according to my testing. This is was all done using a table of 10,000 records, with 1,000,000 updates.
*Corrected result.
Code for the BeginTrans/CommitTrans test.
Sub CommitTest()
Dim C As String
Dim I As Long
Dim J As Long
Dim RS As Recordset
Dim BegTime As Date
Dim EndTime As Date
BegTime = Now()
Set RS = CurrentDb.OpenRecordset("tblTest")
For J = 1 To 200
RS.MoveFirst
DBEngine.Workspaces(0).BeginTrans
For I = 1 To 5000
C = Chr(Int(Rnd() * 26 + 66))
RS.Edit
RS("coltest") = C
RS.Update
RS.MoveNext
Next I
DBEngine.Workspaces(0).CommitTrans
Next J
EndTime = Now()
Debug.Print DateDiff("s", BegTime, EndTime)
End Sub
Upvotes: 1
Reputation: 1846
I created a table called test
with the fields n
and f(n)
Timed 3 different update subroutines - recordset without transaction - recordset with transaction - update query
Sub updateFunction_noTrans()
Dim rs As Recordset
Set rs = CurrentDb.OpenRecordset("test")
rs.MoveFirst
Do Until rs.EOF
rs.Edit
rs("f(n)") = rs("n") + 1
rs.Update
rs.MoveNext
Loop
End Sub
This is basically what you are doing, a straight recordset while editing a field
Sub updateFunction_yesTrans()
Dim i As Long
Dim commitSize As Long
Dim rs As Recordset
commitSize = 5000
Set rs = CurrentDb.OpenRecordset("test")
DBEngine.Workspaces(0).BeginTrans
rs.MoveFirst
Do Until rs.EOF
rs.Edit
rs("f(n)") = rs("n") + 1
rs.Update
rs.MoveNext
i = i + 1
If i = commitSize Then
DBEngine.Workspaces(0).CommitTrans
DBEngine.Workspaces(0).BeginTrans
i = 0
End If
Loop
DBEngine.Workspaces(0).CommitTrans
End Sub
This is the same idea but with transactions. I commit 5000 records at a time as there was some limit established around 9k-10k per commit. You can edit this I believe by going into the registry.
Sub updateFunction_updateQuery()
CurrentDb.Execute ("UPDATE test SET test.[f(n)] = [n]+1;")
End Sub
This is faster than any of the recordset methods. E.g. on around 2 million records it took ~20 seconds without transactions, ~18-19 seconds with transactions, ~14 seconds with the update query.
This all is under the assumption that the field to be updated depends on values calculated from other fiends within that records
To really speed up these kind of actions, sometimes it is dependent on the situation and more detail is needed if this doesn't apply.
Edit: Used old core 2 duo machine + no indices on the fields
Upvotes: 2