Washington Guedes
Washington Guedes

Reputation: 4365

Performance alternative over Scripting.Dictionary

I am coding a Manager in Excel-VBA with several buttons.

One of them is to generate a tab using another Excel file (let me call it T) as input.

Some properties of T:

Main relation:

I am generating a new Excel tab with data from T of the last year/month grouped by Seller.

Important notes:


This is enough, now you know what I have already coded.

My code works, but, it takes about 4 minutes of runtime.

I have already coded some other buttons using smaller sources (not greater than 2MB) which runs in 5 seconds.

Considering T size, 4 minutes runtime could be acceptable.

But I'm not proud of it, at least not yet.


My code is mainly based on Scripting.Dictionary to map data from T, and then I use for each key in obj ... next key to set the grouped data to the new created tab.

I'm not sure, but here are my thoughts:


I want to be wrong with my thoughts. But if I'm not wrong, my next step (and last hope) to reduce the runtime of this function is to code my own class object with Tries.

I will only start coding tomorrow, what I want is just some confirmation if I am in the right way, or some advices if I am in the wrong way of doing it.

Do you have any suggestions? Thanks in advance.

Upvotes: 0

Views: 2100

Answers (2)

Jan Hornak
Jan Hornak

Reputation: 36

I'm convinced that you've already found the right solution because there wasn't any update for last two years.

Anyhow, I want to mention (maybe it will help someone else) that your bottleneck isn't the Dictionary or Binary Tree. Even with millions of rows the processing in memory is blazingly fast if you have sufficient amount of RAM.

The botlleneck is usually the reading of data from worksheet and writing it back to the worksheet. Here the arrays come very userfull.

Just read the data from worksheet into the Variant Array. You don't have to work with that array right away. If it is more comfortable for you to work with dictionary, just transfer all the data from array into dictionary and work with it. Since this process is entirely made in memory, don't worry about the performance penalisation.

When you are finished with data processing in dictionary, put all data from dictionary back to the array and write that array into a new worksheet at one shot.

Worksheets("New Sheet").Range("A1").Value = MyArray

I'm pretty sure it will take only few seconds

Upvotes: 1

Washington Guedes
Washington Guedes

Reputation: 4365

Memory Limit Exceeded

In short:

  • The main problem was because I used a dynamic programming approach of storing information (preprocessing) to make the execution time faster.
  • My code now runs in ~ 13 seconds.

There are things we learn the hard way. But I'm glad I found the answer.

  • Using the Task Manager I was able to see my code reaching 100% memory usage.
  • The DP approach I mentioned above using Scripting.Dictionary reached 100% really faster.
  • The DP approach I mentioned above using my own cls_trie implementation also reached 100%, but later than the first.
  • This explains the ~4-5 min compared to ~2-3 min total runtime of above attempts.
  • In the Task Manager I could also see that the CPU usage never hited 2%.

Solution was simple, I had to balance CPU and Memory usages.

  • I changed some DP approaches to simple for-loops with if-conditions.
  • The CPU usage now hits ~15%.
  • The Memory usage now hits ~65%.
  • I know this is relative to the CPU and Memory capacity of each machine. But in the client machine it is also running in no more than 15 seconds now.

I created one GitHub repository with my cls_trie implementation and added one excel file with an example usage.

I'm new to the excel-vba world (4 months working with it right now). There might probably have some ways to improve my cls_trie implementation, I'm openned to suggestions:

Option Explicit

Public Keys As Collection
Public Children As Variant
Public IsLeaf As Boolean

Public tObject As Variant
Public tValue As Variant

Public Sub Init()
    Set Keys = New Collection
    ReDim Children(0 To 255) As cls_trie
    IsLeaf = False

    Set tObject = Nothing
    tValue = 0
End Sub

Public Function GetNodeAt(index As Integer) As cls_trie
    Set GetNodeAt = Children(index)
End Function

Public Sub CreateNodeAt(index As Integer)
    Set Children(index) = New cls_trie
    Children(index).Init
End Sub

'''
'Following function will retrieve node for a given key,
'creating a entire new branch if necessary
'''
Public Function GetNode(ByRef key As Variant) As cls_trie
    Dim node As cls_trie
    Dim b() As Byte
    Dim i As Integer
    Dim pos As Integer

    b = CStr(key)
    Set node = Me

    For i = 0 To UBound(b) Step 2
        pos = b(i) Mod 256

        If (node.GetNodeAt(pos) Is Nothing) Then
            node.CreateNodeAt pos
        End If

        Set node = node.GetNodeAt(pos)
    Next

    If (node.IsLeaf) Then
        'already existed
    Else
        node.IsLeaf = True
        Keys.Add key
    End If

    Set GetNode = node
End Function

'''
'Following function will get the value for a given key
'Creating the key if necessary
'''
Public Function GetValue(ByRef key As Variant) As Variant
    Dim node As cls_trie
    Set node = GetNode(key)
    GetValue = node.tValue
End Function

'''
'Following sub will set a value to a given key
'Creating the key if necessary
'''
Public Sub SetValue(ByRef key As Variant, value As Variant)
    Dim node As cls_trie
    Set node = GetNode(key)
    node.tValue = value
End Sub

'''
'Following sub will sum up a value for a given key
'Creating the key if necessary
'''
Public Sub SumValue(ByRef key As Variant, value As Variant)
    Dim node As cls_trie
    Set node = GetNode(key)
    node.tValue = node.tValue + value
End Sub

'''
'Following function will validate if given key exists
'''
Public Function Exists(ByRef key As Variant) As Boolean
    Dim node As cls_trie
    Dim b() As Byte
    Dim i As Integer

    b = CStr(key)
    Set node = Me

    For i = 0 To UBound(b) Step 2
        Set node = node.GetNodeAt(b(i) Mod 256)

        If (node Is Nothing) Then
            Exists = False
            Exit Function
        End If
    Next

    Exists = node.IsLeaf
End Function

'''
'Following function will get another Trie from given key
'Creating both key and trie if necessary
'''
Public Function GetTrie(ByRef key As Variant) As cls_trie
    Dim node As cls_trie
    Set node = GetNode(key)

    If (node.tObject Is Nothing) Then
        Set node.tObject = New cls_trie
        node.tObject.Init
    End If

    Set GetTrie = node.tObject
End Function

You can see in the above code:

  • I hadn't implemented any delete method because I didn't need it till now. But it would be easy to implement.
  • I limited myself to 256 children because in this project the text I'm working on is basically lowercase and uppercase [a-z] letters and numbers, and the probability that two text get mapped to the same branch node tends zero.

as a great coder said, everyone likes his own code even if other's code is too beautiful to be disliked [1]

My conclusion

  • I will probably never more use Scripting.Dictionary, even if it is proven that somehow it could be better than my cls_trie implementation.

Thank you all for the help.

Upvotes: 1

Related Questions