sevans
sevans

Reputation: 39

Optimizing Multiple Collection operations in VBA

I'm trying to optimize code that I use to sum certain values in a column based on some criteria. My columns are A (PersonID), B (Firm), and C (ValuetoSum). A very abridged version might look like this:

A             B              C
1             BAML           100
1             HSBC           150
2             HSBC           110
4             CITI           150
5             HSBC           200

I want to loop through each firm in column B, find all of IDs of people that correspond to them and sum the all of values in column C corresponding to those IDs. So for HSBC, the code would collect the IDs 1 and 4 and then sum 130 + 100 + 120 = 460.

I currently do this using multiple loops and collections that take a long time to run. The process is as follows:

For each firm Create a collection of PersonID's based on criteria (firm and year) Create a collection of Values based on collection of person IDs and criteria (year) Sum all the values in the second collection next firm

For those trying to navigate the code below: RP refers to a person and this portion of the code is interested in finding values for the trend year (last year). So TrendYearRPColl is Trend Year Research Partner Collection.

For i2 = 2 To LastRowUniqueClientList

   ActiveFirm = Cells(i2, UniqueClientListColNum).Value

Set TrendYearRPColl = New Collection

    For i3 = 2 To LastRow
        If Cells(i3, DBFirmColNum).Value = ActiveFirm And Cells(i3, DBYearColNum).Value = TrendYear Then
                TrendYearRPColl.Add Cells(i3, DBRespondentKeyColNum).Value

            End If
        Next i3


Set TrendYearMktShareColl = New Collection


For Each TrendYearRP In TrendYearRPColl
    For i7 = 2 To LastRow
        If Cells(i7, DBRespondentKeyColNum).Value = TrendYearRP And Cells(i7, DBYearColNum).Value = TrendYear Then
            TrendYearMktShareColl.Add Cells(i7, DBMktShareVolColNum).Value

        End If
    Next i7
Next TrendYearRP


For Each TrendYearMktShare In TrendYearMktShareColl

    TrendYearSum = TrendYearSum + TrendYearMktShare

Next TrendYearMktShare

I am wondering if anybody here thinks it would be worth converting this operation to several worksheet functions to save calculation time. If it is worth it, I would also really appreciate a recommendation on the direction to take. I have put together a few ws functions that do the job, but they require adding in and writing to columns since I am not very adept with these formulas.

Please let me know if anything needs to be explained better and thanks to anyone who takes a stab at this.

-Steve

Edited to show 460 as output.

Upvotes: 0

Views: 875

Answers (1)

HedgePig
HedgePig

Reputation: 468

Steve, it's not clear from your example what you are wanting. For instance, the PersonID's associated with Firm HSBC are 1,2 & 5. If I add up the ValuetoSum for these ID's, I get 100+150+110+200 = 470. Can you clarify what you mean or what I am misunderstanding? Could you also clarify, how slow "slow" is and what would be an acceptable run time? (I'm not sure if you want 0.1 secs is to slow, or if 50 seconds would be fine.) Also, how many records are you working with?

Edited after Steve's clarification: Ah, got it...I think. So for each firm you trying to find all the client ID's "belonging" to that firm and then add up all the "values" associated with that client ID, even if that same client ID occurs again, associated with another firm? Is that right?

If so, I think you could try the following approach:

This approach requires a single iteration to read in all the data. This firsty iteration calculates the total for each client and also identifies each firm AND the clients belonging to that firm. The second iteration then goes through each client for each firm to get a grand total for each firm.

So if you had 1000 rows if information and 40 firms (lets say each had 50 clients on average), you'd be looking at 1000 initial iterations and the a further 40x50 = 2000 iterations. The second set of iterations doesn't actually require any reading from the spreadsheet (which is quite slow). Hopefully this aopproach is faster. I actually tried this on a sample of random data. I had a a million rows of with around 1300 firms and it ran in just under 40 seconds - so it fully processes about 25,000 rows in a second. (My computer isn't a fast one.) This seems reasonably quick to me but I'm not sure what sort of speed you are looking for.

A more detailed outline of the approach is as follows:

A) Loop through your input and build up:

  1. A collection of unique firm ID's
  2. A collection of unique client ID's with the associated totals for that client ID. (So in your example, your total for ID 1 would start off as 100 and then be updated to 250 once the second record has been read in.)

The problem with the second collection is that you can't store a type double in the collection (with the key being the client ID) and then change that value, at least not directly. So you can't do something like this:

ClientIDCln(ClientID) = ClientIDCln(ClientID) + CurrentRowValue

(where ClientID is the key used to access the running total for a given client)

However, if you instead create a small Class which only has a single public member of type double, then you can add that the ClientID collection and update that total each time you come across the client ID again. So you need to do something like this:

Dim NewEntry As New ClientRunningTotalClass
ClientIDCln.Add NewEntry, Key:=ClientID 
ClientIDCln(ID).RunningTotal = ClientIDCln(ClientID).RunningTotal + Amount

B) The second thing you need to do on your loop through of the data is maintain a "collection of collections". Basically you create an entry in the "master" collection for each unique firm ID. And the entry you create within the master collection is ...a new collection. This new collection is a collection of the client ID's associated with that firm. So in your example, you would have something like

Master Collection Entries     Contents for each collection within the master
BAML                          1
HSBC                          1, 2, 5
CITI                          150

C) Finally, when you have run through your data, you will need to cycle through each collection within the master collection, and adding up the already calculated client totals for each client ID. (Rememember you can use the client ID to access find the total for that client in your "unique client ID collection.:" from step A.

To do all of this you will need to be doing a bit of error handling as you will find that when you are updating your collections that either the item doesn't exist when you want it to or that it already exists when you are trying to keep a unique list.

Anyhow, I hope this helps a bit. Shout if you need more detail.

Finally (although perhaps this should have been first), are you using Application.Screenupdating = FALSE when you are writing your results to the spreadsheet? That can slow things donw a lot. Also have you set calcution mode to manual? (Just checking!)

Edit 2: OK, I've pasted the code below Apart from this you will also need to add a Class module (from the Insert menu) and name it ClientRunningTotalClass (use F4 to bring up the properties and rename it there.) This class is really simple - I've added the code at the end. (And yes, it consists of just two declarations!)

Option Explicit

'Takes a data where each row as a client ID, a firm ID and a total
'It then find all the clients of a particular firm and adds up the totals for those clients (including amounts for that client associated with otehr firms)
Sub SumAllClientAmountsForEveryFirm()

   Dim ClientTotalCln As New Collection      'Collection of totals for each client (client ID used as key)
   Dim FirmCln As New Collection             'Collection of firm ID's (really only needed to print out the FirmID)
   Dim FirmClientListCln As New Collection   'Collection of collections! For each firm a collection object is added to this collection

   Dim WS As Worksheet                       'Worksheet for input and output
   Dim inrow As Long                         'current row of input
   Dim currClientID As String                'current client ID that has just been read on
   Dim currFirm As String                    'current firm
   Dim currAmount As Double                  'current amount

   Dim starttime As Double
   starttime = Now()


   'Loop through all the input rows to do the folloiwng
   '1) Create a collection of client totals
   '2) Create a collection of collections
   '   FirmClientListCln is a collection which itself contains a collections of client ID's (one collection for each firm)
   '   The first time the program comes across a new firm ID, it will add the firm ID to the FirmID collection
   '   _and_ create a new collection in FirmClientListCln. The client is added to the inner collection, as are any subsequent
   '   client ID's that are found for that particular firm
   '   Note that item number n in FirmCln and FirmClientListCln both refer to the same firm. FirmID is really only needed to
   '   keep a track of the firm's ID for printing out purposes.

   Set WS = ThisWorkbook.Worksheets("Sheet1")
   inrow = 5  'Assume first row of input starts in in row 5 (and column 1) of worksheet called "Sheet1"

   Do While WS.Cells(inrow, 1) <> ""
      currClientID = CStr(WS.Cells(inrow, 1))
      currFirm = WS.Cells(inrow, 2)
      currAmount = WS.Cells(inrow, 3)

      Call CalcTotalForClientID(ClientTotalCln, currClientID, currAmount)
      Call UpdateListOfFirmsAndTheirClients(FirmCln, FirmClientListCln, currClientID, currFirm)
      inrow = inrow + 1
   Loop

   'Now dump the results
   Application.ScreenUpdating = False
   Application.Calculation = xlCalculationManual 'prevents workbook from recalculating each time a cell is changed

   'For debugging only - spitting out total for each client. Although the client ID isn't tracked!
   Dim i As Long, j As Long
   Dim FirmTotal As Double
   WS.Range("F4") = "Client ID"
   WS.Range("G4") = "Client Total"
   For i = 1 To ClientTotalCln.Count
      WS.Cells(4 + i, 6) = ClientTotalCln(i).ClientID
      WS.Cells(4 + i, 7) = ClientTotalCln(i).RunningTotal
   Next

   'Now dump totals for each firm
   WS.Range("J4") = "Firm"
   WS.Range("K4") = "Total for all clients"
   For i = 1 To FirmCln.Count
      WS.Cells(4 + i, 10) = FirmCln(i)
      FirmTotal = 0
      For j = 1 To FirmClientListCln(i).Count
         WS.Cells(4 + i, 12 + j) = FirmClientListCln(i).Item(j)  'Debugging - uncomment this if you want to see the client ID's associated with a firm
         FirmTotal = FirmTotal + ClientTotalCln(FirmClientListCln(i).Item(j)).RunningTotal
      Next
      WS.Cells(4 + i, 11) = FirmTotal
   Next
   Application.ScreenUpdating = True
   Application.Calculation = xlCalculationAutomatic

   WS.Range("A3") = "Run time : " & Format(Now() - starttime, "hh:mm:ss")
End Sub

'Keeps a running total of Amount for each ClientID
Sub CalcTotalForClientID(ClientTotalCln As Collection, ClientID As String, Amount As Double)

   'Try an increase the total for the current ClientID
   'If a running total for the current ClientID hasn't already been started an error will be generated.
   'Catch that error, create an entry for that client ID and then try and update the total again.
   On Error GoTo ErrClientIDNotInCollection
   ClientTotalCln(ClientID).RunningTotal = ClientTotalCln(ClientID).RunningTotal + Amount
   On Error GoTo 0

   Exit Sub

'Adds a new instance of a Running Total class to the ClientTotalCln, using the client ID as the
'key
ErrClientIDNotInCollection:
   Dim NewEntry As New ClientRunningTotalClass  'Creates an instance of the clasee to add to the collection. (The "new" keyword is important!)
   NewEntry.ClientID = ClientID
   ClientTotalCln.Add NewEntry, Key:=CStr(ClientID)

   Resume
End Sub

'Keeps a list of firms and the ClientID's belonging to each firm
Sub UpdateListOfFirmsAndTheirClients(FirmCln As Collection, FirmClientListCln As Collection, ClientID As String, Firm As String)

   'Try and add a client ID to the firm
   'This will generate an error if they firm doesn't exist OR
   'if the client ID has already been added
   On Error GoTo ErrFirmNotInCollection
   FirmClientListCln(Firm).Add Item:=ClientID, Key:=ClientID
   On Error GoTo 0
   Exit Sub

ErrFirmNotInCollection:
   Call AddIfFirmNotExists(FirmCln, FirmClientListCln, Firm, ClientID)
   Resume Next

Exit Sub

End Sub

'Adds a new firm to the collection
'Note that we may reach here if the firm does already exist but the client ID has already been added.
'In that case, further errors will be generated and nothing will be done (which is what we want because we already have the client ID)
Sub AddIfFirmNotExists(FirmCln As Collection, FirmClientListCln As Collection, Firm, ClientID)
   Dim ClientTotalCln As New Collection
   On Error Resume Next
   FirmCln.Add Item:=Firm, Key:=Firm
   FirmClientListCln.Add Item:=ClientTotalCln, Key:=Firm
   FirmClientListCln(Firm).Add Item:=ClientID, Key:=CStr(ClientID)
   On Error GoTo 0
End Sub

Code for ClientRunningTotalClass

Option Explicit
'Maintains a running total for a single client.
Public RunningTotal As Double
Public ClientID As String        'Only for debugging (print out the Client ID alongside client total amount)

Edit 3: Handling a 4th column with Year I assume that for the fourth column containing years, you want to treat "HSBC 2014" as being a completely different beast from "HSBC 2015" and likewise "Customer 1 2014" as a different animal from "Customer 1 2015". If so, I can think of two approaches off the top of my head that should work. The first is to presort the data by year and then process it blocks by year. (i.e. once you come to a row with new year, you spit our a summary and start with the next block). The other would be to use a key to the collection that consists of both Firm and Year, e.g. "HSBC|2015" and, likewise, a customer ID that consists of both the ID and the year, "1|2015" You may need to create a new class to keep of the Firm and year. (The new class would contain both the Firm and the year as fields) This because currently the FirmCln just has the Firm name added directly into it (you can do this with "native" type data such as int or double or string). However, if you want to add in name AND year, you could create a class to store this. Or you could concatenate them into one string and then split the string when you dump the results into Excel. Anyhow these are just some thoughts - hope you get it all working.

Upvotes: 1

Related Questions