Maldred
Maldred

Reputation: 1104

Find Existing String in 2-D Array

I'm collecting data from a spreadsheet and storing it in a 2-D Array, the idea is that once the script detects it's reading from a specific column, it would not read an entire row of the data (as this would be considered a duplicate).

CODE:

Private Sub LoadData()

   cDOC_DEBUG "Loading document data..."
   Dim x As Long  'Column Data - there is another function that reads when x = 0 = header; else every other value is considered "data"
   Dim y As Long 

   With dataWS
      For x = 1 To LR - 1
         For y = 1 To LC - 1
            If (IsInArray(.Cells(x + 1, y + 1).value, pData())) Then
               cDOC_DEBUG "Added: " & .Cells(x + 1, y + 1).value
               pData(x, y) = Trim(.Cells(x + 1, y + 1).value)
            End If
         Next y
      Next x
   End With

End Sub

Private Function IsInArray(stringToBeFound As String, arrString As Variant) As Boolean
   IsInArray = (UBound(Filter(arrString, stringToBeFound)) > -1)
End Function

Private Sub cDOC_DEBUG(debugText As String)
   If (ThisWorkbook.Worksheets("Settings").Cells(3, 2)) Then
      Debug.Print debugText
   End If
End Sub

Everything is loading into the array fine, until I start implementing my IsInArray function. I can see it has to do with the fact that it's searching through a single dimensional array, and my array is two dimensional; so it makes sense that it's getting a type mismatch error.

Each row within the spreadsheet is a segment of information that correlates to it's self.

Initial Data From Spreadsheet:

        A           B           C           D
1    header1     header2     header3     header4
2       a           b           c           d
3       w           x           y           z
4       a           h           j           j
5       a           b           j           d
6       w           x           u           z

2x2 Final Array:

        0           1           2           3
0    header1     header2     header3     header4
1       a           b           c           d
2       w           x           y           z
3       a           h           j           j

Because Header1 & Header2 & Header4 from Excel rows 5 & 6 have the same values as Excel rows 2 and 3, this will not be read into the array.

Question:

How would I match the criteria above to not include the duplicates from a row.

Example Sudo Code:

If (Value being added matches all values from column Header1 & Header2 & Header3_ Then

Don't add to array

Another issue that I am aware of, is that there will be blank data within this array; is there something I can do to either 1 remove these or will I have to have another index for the array slots to keep track of?

Upvotes: 1

Views: 481

Answers (2)

T.M.
T.M.

Reputation: 9948

Alternative using advanced Index function

This approach using a (late bound) dictionary should be helpful, if your data rows don't exceed the number of 65536. You'll get a 2-dim (1-based) array v with the unique data set of columns A,B and D.

In this example code results are written back to e.g. columns F:H and values of column C are omitted; if you want to maintain these values see ► Edit below.

Example code (omitting column C in resulting array)

Sub getUniqueRows()
Dim dict As Object, v, i&, ii&, n&, currRow$
Set dict = CreateObject("Scripting.Dictionary")         ' late binding dictionary
With ThisWorkbook.Worksheets("MySheet")                 ' << change to your sheet name
  n = .Cells(.Rows.Count, "A").End(xlUp).Row - 1     ' n items (omitting header line)
' [1] get data
  v = .Range("A2:D" & n + 1).Value
' [2a]remove column C (i.e. allow columns 1, 2 and 4 only)
  v = Application.Index(v, Evaluate("row(1:" & n & ")"), Array(1, 2, 4))
' [2b] check for unique ones
  For i = 1 To n
     currRow = Join(Application.Index(v, i, 0), ",") ' build string of cells A,B & D
     If Not dict.Exists(currRow) Then dict.Add currRow, i
  Next i
' [3] remove duplicate rows
  v = Application.Transpose(Application.Index(v, dict.Items, Evaluate("row(1:" & 3 & ")")))
' [4] write data to any wanted range
  .Range("F:H") = ""                                 ' clear rows
  .Range("F2").Resize(UBound(v), 3) = v              ' write data
End With
Set dict = Nothing
End Sub

Note

The dict.Items collection in section [3] is an array of all found item numbers in the dictionary and allows the Index function to get only these items.

Additional links

See Insert new first column in datafield array without loops or API call

Edit - maintain values in column C

Due to comment: "ONLY using columns A, B, and D; Column C was not including in the criteria."

If you want to check values only in A,B and D, but maintain the C values in the resulting array you can use the following optimized code neglecting an empty values row.

Sub getUniqueRows2()
Dim dict As Object, v, i&, n&, j&, currRow$
Set dict = CreateObject("Scripting.Dictionary")          ' late binding dictionary
With ThisWorkbook.Worksheets("MySheet")                  ' << change to your sheet name
     n = .Cells(.Rows.Count, "A").End(xlUp).Row - 1      ' items counter  (omitting header line)
   ' [1] get data
     v = .Range("A2:D" & n + 1).Value
   ' [2] check for unique ones
     For i = 1 To UBound(v)
       ' assign ONLY criteria of 1st, 2nd & 4th column to string value currRow
         currRow = ""
         For j = 0 To 2: currRow = currRow & v(i, Array(1, 2, 4)(j)) & ",": Next j
       ' add first unique occurrence to dictionary
         If Not dict.Exists(currRow) Then                 ' add first occurrence
             If Len(currRow) > 3 Then dict.Add currRow, i ' ... and ignore empty values
         End If
     Next i
   ' [3] remove duplicate rows
     v = Application.Transpose(Application.Index(v, dict.Items, Evaluate("row(1:" & 4 & ")")))
   ' [4] write resulting array values anywhere, e.g. to columns F:I
     .Range("F:I") = ""                                   ' clear rows
     .Range("F2").Resize(UBound(v), 4) = v                ' write data
End With
Set dict = Nothing
End Sub

Upvotes: 0

QHarr
QHarr

Reputation: 84465

You can loop rows/columns and use Index to slice a row/column out of the array and use Match to test if search value is in that column. Combine with Count to test for duplicates. If the count equals the number of columns ignore value (or column count -1... see next comment ==>). Not entirely sure about this imaginary column. Do you intend to dimension at start with an additional empty column?

Row Versions:

Exists:

Option Explicit
Public Sub CheckRow()
    Dim arr(), i As Long
    arr = [A1:D6].Value                          '<==2D array created

    For i = LBound(arr, 1) To UBound(arr, 1)     '<== loop rows
        'look in each row for x and if found exit loop and indicate row where found
        If Not IsError(Application.Match("x", Application.WorksheetFunction.Index(arr, i, 0), 0)) Then
            Debug.Print "value found in column " & i
            Exit For
        End If
    Next
End Sub

Duplicates:

Option Explicit
Public Sub CheckRow()
    Dim arr(), i As Long
    arr = [A1:D6].Value                          '<==2D array created

    For i = LBound(arr, 1) To UBound(arr, 1)     '<== loop rows
        'look in each row for more than one "B" and if found exit loop and indicate row where found
         If Application.Count(Application.Match(Application.WorksheetFunction.Index(arr, i, 0), "B", 0)) > 1 Then
            Debug.Print i
            Exit For
        End If
    Next
End Sub

exists:


Columns versions:

Exists:

Option Explicit
Public Sub CheckColumn()
    Dim arr(), i As Long
    arr = [A1:D6].Value                          '<==2D array created

    For i = LBound(arr, 2) To UBound(arr, 2)     '<== loop columns
        'look in each column for x and if found exit loop and indicate column where found
        If Not IsError(Application.Match("x", Application.WorksheetFunction.Transpose(Application.WorksheetFunction.Index(arr, 0, i)), 0)) Then
            Debug.Print "value found in column " & i
            Exit For
        End If
    Next
End Sub

Duplicates:

You can use Count to check for duplicates within an entire column, again sliced with Index:

Option Explicit
Public Sub CheckColumn()
    Dim arr(), i As Long
    arr = [A1:D6].Value                          '<==2D array created

    For i = LBound(arr, 2) To UBound(arr, 2)     '<== loop columns
        'look in each column for more than one "B" and if found exit loop and indicate column where found
         If Application.Count(Application.Match(Application.WorksheetFunction.Transpose(Application.WorksheetFunction.Index(arr, 0, i)), "B", 0)) > 1 Then
            Debug.Print i
            Exit For
        End If
    Next
End Sub

Using sample data in sheet:

data

Upvotes: 1

Related Questions