Reputation: 4365
I am coding a Manager in Excel-VBA with several buttons.
One of them is to generate a tab using another Excel file (let me call it T
) as input.
Some properties of T
:
~90MB
size~350K
lines14
months (unordered).year/month
Main relation:
I am generating a new Excel tab with data from T
of the last year/month
grouped by Seller.
Important notes:
T
is the only available input/source.total-money
should be counted to all of those Sellers.This is enough, now you know what I have already coded.
My code works, but, it takes about 4 minutes of runtime.
I have already coded some other buttons using smaller sources (not greater than 2MB
) which runs in 5 seconds.
Considering T
size, 4 minutes runtime could be acceptable.
But I'm not proud of it, at least not yet.
My code is mainly based on Scripting.Dictionary
to map data from T
, and then I use for each key in obj ... next key
to set the grouped data to the new created tab.
I'm not sure, but here are my thoughts:
N
is the total keys in a Scripting.Dictionary
, and I need to check for obj.Exists(str)
before aggregating total-money
. It will run N
string compares to return false
.N
string compares when I do Set seller = obj(seller_name)
.I want to be wrong with my thoughts. But if I'm not wrong, my next step (and last hope) to reduce the runtime of this function is to code my own class object with Tries
.
I will only start coding tomorrow, what I want is just some confirmation if I am in the right way, or some advices if I am in the wrong way of doing it.
Do you have any suggestions? Thanks in advance.
Upvotes: 0
Views: 2100
Reputation: 36
I'm convinced that you've already found the right solution because there wasn't any update for last two years.
Anyhow, I want to mention (maybe it will help someone else) that your bottleneck isn't the Dictionary or Binary Tree. Even with millions of rows the processing in memory is blazingly fast if you have sufficient amount of RAM.
The botlleneck is usually the reading of data from worksheet and writing it back to the worksheet. Here the arrays come very userfull.
Just read the data from worksheet into the Variant Array. You don't have to work with that array right away. If it is more comfortable for you to work with dictionary, just transfer all the data from array into dictionary and work with it. Since this process is entirely made in memory, don't worry about the performance penalisation.
When you are finished with data processing in dictionary, put all data from dictionary back to the array and write that array into a new worksheet at one shot.
Worksheets("New Sheet").Range("A1").Value = MyArray
I'm pretty sure it will take only few seconds
Upvotes: 1
Reputation: 4365
Memory Limit Exceeded
In short:
~ 13 seconds
.There are things we learn the hard way. But I'm glad I found the answer.
Scripting.Dictionary
reached 100% really faster.cls_trie
implementation also reached 100%, but later than the first.~4-5 min
compared to ~2-3 min
total runtime of above attempts.2%
.Solution was simple, I had to balance CPU and Memory usages.
for-loops
with if-conditions
. ~15%
.~65%
.15 seconds
now.I created one GitHub repository with my cls_trie implementation and added one excel file with an example usage.
I'm new to the excel-vba world (4 months working with it right now). There might probably have some ways to improve my cls_trie implementation, I'm openned to suggestions:
Option Explicit
Public Keys As Collection
Public Children As Variant
Public IsLeaf As Boolean
Public tObject As Variant
Public tValue As Variant
Public Sub Init()
Set Keys = New Collection
ReDim Children(0 To 255) As cls_trie
IsLeaf = False
Set tObject = Nothing
tValue = 0
End Sub
Public Function GetNodeAt(index As Integer) As cls_trie
Set GetNodeAt = Children(index)
End Function
Public Sub CreateNodeAt(index As Integer)
Set Children(index) = New cls_trie
Children(index).Init
End Sub
'''
'Following function will retrieve node for a given key,
'creating a entire new branch if necessary
'''
Public Function GetNode(ByRef key As Variant) As cls_trie
Dim node As cls_trie
Dim b() As Byte
Dim i As Integer
Dim pos As Integer
b = CStr(key)
Set node = Me
For i = 0 To UBound(b) Step 2
pos = b(i) Mod 256
If (node.GetNodeAt(pos) Is Nothing) Then
node.CreateNodeAt pos
End If
Set node = node.GetNodeAt(pos)
Next
If (node.IsLeaf) Then
'already existed
Else
node.IsLeaf = True
Keys.Add key
End If
Set GetNode = node
End Function
'''
'Following function will get the value for a given key
'Creating the key if necessary
'''
Public Function GetValue(ByRef key As Variant) As Variant
Dim node As cls_trie
Set node = GetNode(key)
GetValue = node.tValue
End Function
'''
'Following sub will set a value to a given key
'Creating the key if necessary
'''
Public Sub SetValue(ByRef key As Variant, value As Variant)
Dim node As cls_trie
Set node = GetNode(key)
node.tValue = value
End Sub
'''
'Following sub will sum up a value for a given key
'Creating the key if necessary
'''
Public Sub SumValue(ByRef key As Variant, value As Variant)
Dim node As cls_trie
Set node = GetNode(key)
node.tValue = node.tValue + value
End Sub
'''
'Following function will validate if given key exists
'''
Public Function Exists(ByRef key As Variant) As Boolean
Dim node As cls_trie
Dim b() As Byte
Dim i As Integer
b = CStr(key)
Set node = Me
For i = 0 To UBound(b) Step 2
Set node = node.GetNodeAt(b(i) Mod 256)
If (node Is Nothing) Then
Exists = False
Exit Function
End If
Next
Exists = node.IsLeaf
End Function
'''
'Following function will get another Trie from given key
'Creating both key and trie if necessary
'''
Public Function GetTrie(ByRef key As Variant) As cls_trie
Dim node As cls_trie
Set node = GetNode(key)
If (node.tObject Is Nothing) Then
Set node.tObject = New cls_trie
node.tObject.Init
End If
Set GetTrie = node.tObject
End Function
You can see in the above code:
[a-z]
letters and numbers, and the probability that two text get mapped to the same branch node tends zero.as a great coder said, everyone likes his own code even if other's code is too beautiful to be disliked [1]
My conclusion
Scripting.Dictionary
, even if it is proven that somehow it could be better than my cls_trie
implementation.Thank you all for the help.
Upvotes: 1