Achal Desai
Achal Desai

Reputation: 93

Excel VBA - N number of Random records per unique group

I am working on developing a system health check tool to validate that 4 different systems are in sync. To do that, I need to create a sample dataset of random N number of records for every unique key/combination from a main data set everyday. All 4 systems will be checked for records from this sample dataset and any differences will be highlighted using conditional formatting.

I am having trouble figuring out how to extract the sample dataset from the main dataset with the criteria mentioned above.

For Example, I have a report that has 700 rows. Each unique combination (concatenation to create a key) of the 6 fields [Client-Contractor-Distribution Center-Service Level-Alert Value-Status] has 100 records. This part will be dynamic. There could be any number of unique combinations and any number of records per combination. Image below for reference. Only the groups are shown here as I cannot paste 700 records in the question. There are 7 unique groups with 100 records each.

There are some questions in the comments for which I am giving the clarifications below: -Combination/Group = Basically a key created with concatenation of the focus columns to recognize/define a category the records may belong to. As example concating First Name & Last Name to create a unique identity of a person.

enter image description here

Let's say I want 5 random records for each of the 7 GroupKeys. Essentially, I need a way to get 35 records that are randomly selected, 5 per unique combination. A sample of the desired output is shown below. enter image description here

I have tried using RAND() and RANDBETWEEN() formulas. I do get random records. But the problem is that I cannot ensure that I get 5 records per combination and sometimes duplicate records are returned as well. I am open to any method (VBA/Formulas) to achieve this.

This is a very complex problem for someone like me who is only a novice/beginner at VBA at most.

Upvotes: 0

Views: 487

Answers (2)

FaneDuru
FaneDuru

Reputation: 42236

Please, test the next code. It needs a reference to 'Microsoft Scripting Runtime':

Sub RandomRecPerGroup()
   Dim sh As Worksheet, shRet As Worksheet, lastR As Long, dict As New Scripting.Dictionary
   Dim arr, arrIt, i As Long, j As Long, f As Long, k As Long, count As Long, arrFin
   
   Set sh = ActiveSheet 'use here the sheet you need
   Set shRet = sh.Next  'use here the sheet you need (for testing reason, the next against the active one)
   shRet.Range("G1").EntireColumn.NumberFormat = "@"   'format the column to keep 'Reference number' as text
   lastR = sh.Range("A" & sh.rows.count).End(xlUp).row 'last row
   arr = sh.Range("A2:G" & lastR).Value ' place the range in an array for faster iteration
   ReDim arrFin(1 To 5, 1 To 7): k = 1  'reDim the array to keep each group
   For i = 1 To UBound(arr)             'iterate between the array elements:
        'create a dictionary key if not already existing, with the number of the row as item:
        If Not dict.Exists(arr(i, 1) & arr(i, 2) & arr(i, 3) & arr(i, 4) & arr(i, 5) & arr(i, 6)) Then
            dict.Add arr(i, 1) & arr(i, 2) & arr(i, 3) & arr(i, 4) & arr(i, 5) & arr(i, 6), i
        Else 'adding the number of row, separated by "|"
            dict(arr(i, 1) & arr(i, 2) & arr(i, 3) & arr(i, 4) & arr(i, 5) & arr(i, 6)) = _
                  dict(arr(i, 1) & arr(i, 2) & arr(i, 3) & arr(i, 4) & arr(i, 5) & arr(i, 6)) & "|" & i
        End If
   Next i
   Dim rndNo As Long              'a variable to receive the random number
   For i = 0 To dict.count - 1    'iterate between the dictionary elements:
        arrIt = Split(dict.items(i), "|"): ' split the item by "|" to obtain the same group existing rows
        For k = 1 To 5            'iterate to extract the 5 necessary sample rows of each group
                Randomize         'initialize the random numbers generation
                If UBound(arrIt) = -1 Then Exit For     'for the case of less than 5 rows per group
                rndNo = CLng(UBound(arrIt) * Rnd())     'give a value to the variable keeping the random numbers
                For f = 1 To 7                          'iterating to place in the array all 7 columns value
                    arrFin(k, f) = arr(arrIt(rndNo), f)
                Next f
                arrIt = Filter(arrIt, arrIt(rndNo), False) 'eliminate the element just placed in an array, to avoid doubling
        Next k
        lastR = shRet.Range("A" & sh.rows.count).End(xlUp).row + 1 'last empty row of the sheet where the result is returned
        shRet.Range("A" & lastR).Resize(5, 7).Value = arrFin       'drop the array content
   Next i
   MsgBox "Ready..."
End Sub

The code may work without the mentioned reference (using labe binding), but I think it should be good to benefit of intellisense suggestions. If it looks complicated to create it, please (firstly) run the next code which will add it automatically:

Sub addScrRunTimeRef()
  'Add a reference to 'Microsoft Scripting Runtime':
  'In case of error ('Programmatic access to Visual Basic Project not trusted'):
  'Options->Trust Center->Trust Center Settings->Macro Settings->Developer Macro Settings->
  '         check "Trust access to the VBA project object model"
  On Error Resume Next
  Application.VBE.ActiveVBProject.References.AddFromFile "C:\Windows\SysWOW64\scrrun.dll"
  If err.Number = 32813 Then
        err.Clear: On Error GoTo 0
        MsgBox "The reference already exists...": Exit Sub
  Else
        On Error GoTo 0
        MsgBox """Microsoft Scripting Runtime"" reference added successfully..."
  End If
End Sub

Saving the workbook will keep the reference. So, no need to run the code again...

Upvotes: 2

Dominique
Dominique

Reputation: 17493

So basically you have 700 possibilities, and you want to get 5 random values out of them, while you are sure that you don't have duplicates?

There are, basically, two ways to do this:

  • You make a resulting collection of random values, you use the random generator to generate numbers from 1 to 700, but before adding them to your collection, you verify if they are already present in your collection. Something like (pseudo-code):

      Dim col_Result as Collection
      Dim finished as Boolean = False;
      Dim r as integer;
    
      while (not finished){
           r = ConvertToInt(Random(700)) + 1;
           if not(col_Result.Contains(r))
           then col_Result.Add(r);
           finished = (col_Result.Count == 5);
      }
    
  • You make a collection of all numbers from 1 to 700, and 5 times you retrieve a random value out of it, while subtracting that value from the collection. Something like (pseudo-code again):

      Dim col_Values as Collection = (1, 2, ..., 700);
      Dim col_Result as Collection;
    
      Dim r as integer;
    
      for (int i = 0; i < 5; i++){
          r = ConvertToInt(Random(700));
          col_Result.Add(r);
          col_Values.Subtract(r);
      }
    

When using this last approach, it is vital that subtracting a value from the collection shifts the other values: (1,2,3,4,5).Subtract(2) yields (1,3,4,5).

Upvotes: 0

Related Questions