BellyMark
BellyMark

Reputation: 127

VB6 Ms Access Database Editing large amount of records

I need to process hundreds of thousands of records with VB6 and an MS Access database. I iterate through the recordset and edit each record. However it takes a lot of time to do so. Creating a database with the same amount of records using the Addnew and Update methods works much faster.

I will greatly appreciate if anyone shows me any code sample or just a strategy.

Here is the code

Data1(1).RecordSource = "Select * from TABLE order by Field_A ASC"
Data1(1).Refresh
If Data1(1).Recordset.RecordCount > 0 Then
    Data1(1).Recordset.MoveFirst
    Do
        Data1(1).Recordset.Edit
        Data1(1).Recordset.Fields("FIELD") = Sort_Value
        Data1(1).Recordset.Update
        Data1(1).Recordset.MoveNext
    Loop Until Data1(1).Recordset.EOF = True
End If

It is really quite quite simple. The real thing is, I forgot to mention, that tha HDD of the computer constantly reds/writes. This is actually the problem. With such a heavy load there is no way not to affect performance.

I first thought that the recordset generated by the query, keep in mind that we hane 1-2 Million records, causes this problem. I guess it resides on some temporary place on the hard drive and on in RAM. And so executin .Edit and .Update might be a problem in first positioning the cursor at the right place and then writing.

Don't know for sure. Probably there would be an expert to show me a way out.

Btw. I also tried with replacing the Loop Until Data1(1).Recordset.EOF = True statement with a fixed length cycle, because I also read, that this checking for Recordset.EOF also slows down performance.

Thank you in advance!

Upvotes: 6

Views: 7236

Answers (4)

Izhar Aazmi
Izhar Aazmi

Reputation: 935

To enhance performance you can use UpdateBatch method of ADODB Object. But to use this feature, it requires:

  1. adOpenStatic cursorType and
  2. adLockBatchOptimistic LockType

on the recordset object.
Additionally you can also use adUseClient CursorLocation to have load on the client instead of the server during the operation.

To go further, refrain from using rec.EOF test. You should instead use for loop starting from 1 to rec.RecordCount

Worth mentioning: It is unreliable to read rec.RecordCount until ADODB has traversed all records. So do a MoveLast and MoveFirst to ensure correct recordcount.

Use following code as a hint:

set con = Server.CreateObject("ADODB.Connection")
con.Provider = "Microsoft.Jet.OLEDB.4.0"
con.Open(Server.Mappath("MyShopDB.mdb"))
set rec = Server.CreateObject("ADODB.recordset")
sql = "SELECT * FROM Employees"

rec.CursorLocation = adUseClient
rec.CursorType = adOpenStatic
rec.LockType = adLockBatchOptimistic

rec.Open sql, con

if not rec.EOF then            ' rescue no records situation
    rec.moveLast               ' let it see all records
    rec.moveFirst
end if

cnt = rec.RecordCount          ' avoid reading each time in loop test

if cnt > 0 then
    for i = 1 to cnt
        rec.Fields("FIELD").value = Sort_Value
        '...
        '...
        '...
        rec.MoveNext
    next i
    rec.UpdateBatch
end if

rec.Close
con.Close

It has been quite longer than 3 years when I switched from VB to PHP. Might be missing some tracks here. Please note: I have not executed this code yet. It may contain minor issues, however this should be sufficient for indication purpose.

You can also try to split batch into fragments to see effects on performance.

Upvotes: 0

Clon
Clon

Reputation: 1055

While it may be necessary in some cases, iterating through recordsets in order to update a field should be avoided.

The sensible thing to do, which is much more effective, would be to write an SQL update query.

If your table is so big, you must be careful with the election of your indexes, specially the primary key.

Then, you can divide the data based on your PK, and update all the records in the first set, then in the second, third...

 UPDATE super_big_table
 SET Field_A = some_function_to_make_it_sort_value
 WHERE myPrimaryKey BETWEEN ( left_boundary AND right_boundary )

You repeat this (by code) for all the divisions you have made in your table.

Now, the question is- can you think of an Access function which creates the required Sort_value?

Note that if Field_A is your primary key, then you shouldn't change it. Otherwise, all your table would be rearranged each time you updated a few records, which would be a lot of work for your HD / processor. In this case, you should have a different PK, and create an index on Field_A, not PK.

Upvotes: 1

Tom Collins
Tom Collins

Reputation: 4069

My only suggestion, which may not work in your case, is to do mass updates using an update query.

Three cases where this could work:

If Sort_Value can be calculated from the other fields, that's a simple UPDATE query, but I'm sure you would have already seen that.

If Sort_Value can be calculated from other records (like previous record), then you can probably write a more complex UPDATE query (I've seen some pretty complex queries posted here).

Lastly, if the same Sort_Value is being applied to a lot of records, then you can issue an UPDATE query based on those records. So, if Sort_Value could be 10 different values, then all your updates would be done in 10 UPDATE queries.


If you tell us where you get the Sort_Value, we might be able to help you further.


Here's some things that DO NOT work to speed up the edit/update commands, according to my testing. This is was all done using a table of 10,000 records, with 1,000,000 updates.

  • RS(1) instead of RS("name"). This was suggested on another site and actually increased time by 20%. (25 sec / 21 sec)
  • BeginTrans/CommitTrans made no difference on an un-indexed field, and was 1% faster on an indexed field. (un-indexed: 11 sec [w/ trans] / 11 sec, indexed: 23 sec [w/ trans] / 25 sec) *
  • Individual SQL statements. (86 sec)
  • Parameter querydef. (43 sec)

*Corrected result.


Code for the BeginTrans/CommitTrans test.

Sub CommitTest()
   Dim C As String
   Dim I As Long
   Dim J As Long
   Dim RS As Recordset
   Dim BegTime As Date
   Dim EndTime As Date
   BegTime = Now()
   Set RS = CurrentDb.OpenRecordset("tblTest")
   For J = 1 To 200
      RS.MoveFirst
      DBEngine.Workspaces(0).BeginTrans
      For I = 1 To 5000
         C = Chr(Int(Rnd() * 26 + 66))
         RS.Edit
         RS("coltest") = C
         RS.Update
         RS.MoveNext
      Next I
      DBEngine.Workspaces(0).CommitTrans
   Next J
   EndTime = Now()
   Debug.Print DateDiff("s", BegTime, EndTime)
End Sub

Upvotes: 1

ashareef
ashareef

Reputation: 1846

I created a table called test with the fields n and f(n)

Timed 3 different update subroutines - recordset without transaction - recordset with transaction - update query

Sub updateFunction_noTrans()
    Dim rs As Recordset
    Set rs = CurrentDb.OpenRecordset("test")
    rs.MoveFirst
    Do Until rs.EOF
        rs.Edit
        rs("f(n)") = rs("n") + 1
        rs.Update
        rs.MoveNext
    Loop
End Sub

This is basically what you are doing, a straight recordset while editing a field

Sub updateFunction_yesTrans()
    Dim i As Long
    Dim commitSize As Long
    Dim rs As Recordset
    commitSize = 5000
    Set rs = CurrentDb.OpenRecordset("test")
    DBEngine.Workspaces(0).BeginTrans
    rs.MoveFirst
    Do Until rs.EOF
        rs.Edit
        rs("f(n)") = rs("n") + 1
        rs.Update
        rs.MoveNext
        i = i + 1
        If i = commitSize Then
            DBEngine.Workspaces(0).CommitTrans
            DBEngine.Workspaces(0).BeginTrans
            i = 0
        End If
    Loop
    DBEngine.Workspaces(0).CommitTrans
End Sub

This is the same idea but with transactions. I commit 5000 records at a time as there was some limit established around 9k-10k per commit. You can edit this I believe by going into the registry.

Sub updateFunction_updateQuery()
    CurrentDb.Execute ("UPDATE test SET test.[f(n)] = [n]+1;")
End Sub

This is faster than any of the recordset methods. E.g. on around 2 million records it took ~20 seconds without transactions, ~18-19 seconds with transactions, ~14 seconds with the update query.

This all is under the assumption that the field to be updated depends on values calculated from other fiends within that records

To really speed up these kind of actions, sometimes it is dependent on the situation and more detail is needed if this doesn't apply.

Edit: Used old core 2 duo machine + no indices on the fields

Upvotes: 2

Related Questions