Reputation: 1104
I'm collecting data from a spreadsheet and storing it in a 2-D Array, the idea is that once the script detects it's reading from a specific column, it would not read an entire row of the data (as this would be considered a duplicate).
CODE:
Private Sub LoadData()
cDOC_DEBUG "Loading document data..."
Dim x As Long 'Column Data - there is another function that reads when x = 0 = header; else every other value is considered "data"
Dim y As Long
With dataWS
For x = 1 To LR - 1
For y = 1 To LC - 1
If (IsInArray(.Cells(x + 1, y + 1).value, pData())) Then
cDOC_DEBUG "Added: " & .Cells(x + 1, y + 1).value
pData(x, y) = Trim(.Cells(x + 1, y + 1).value)
End If
Next y
Next x
End With
End Sub
Private Function IsInArray(stringToBeFound As String, arrString As Variant) As Boolean
IsInArray = (UBound(Filter(arrString, stringToBeFound)) > -1)
End Function
Private Sub cDOC_DEBUG(debugText As String)
If (ThisWorkbook.Worksheets("Settings").Cells(3, 2)) Then
Debug.Print debugText
End If
End Sub
Everything is loading into the array fine, until I start implementing my IsInArray
function. I can see it has to do with the fact that it's searching through a single dimensional array, and my array is two dimensional; so it makes sense that it's getting a type mismatch error.
Each row within the spreadsheet is a segment of information that correlates to it's self.
Initial Data From Spreadsheet:
A B C D
1 header1 header2 header3 header4
2 a b c d
3 w x y z
4 a h j j
5 a b j d
6 w x u z
2x2 Final Array:
0 1 2 3
0 header1 header2 header3 header4
1 a b c d
2 w x y z
3 a h j j
Because Header1 & Header2 & Header4 from Excel rows 5 & 6 have the same values as Excel rows 2 and 3, this will not be read into the array.
Question:
How would I match the criteria above to not include the duplicates from a row.
Example Sudo Code:
If (Value being added matches all values from column Header1 & Header2 & Header3_ Then
Don't add to array
Another issue that I am aware of, is that there will be blank data within this array; is there something I can do to either 1 remove these or will I have to have another index for the array slots to keep track of?
Upvotes: 1
Views: 481
Reputation: 9948
Alternative using advanced Index
function
This approach using a (late bound) dictionary should be helpful, if your data rows don't exceed the number of 65536. You'll get a 2-dim (1-based) array v
with the unique data set of columns A,B and D.
In this example code results are written back to e.g. columns F:H and values of column C are omitted; if you want to maintain these values see ► Edit below.
Example code (omitting column C in resulting array)
Sub getUniqueRows()
Dim dict As Object, v, i&, ii&, n&, currRow$
Set dict = CreateObject("Scripting.Dictionary") ' late binding dictionary
With ThisWorkbook.Worksheets("MySheet") ' << change to your sheet name
n = .Cells(.Rows.Count, "A").End(xlUp).Row - 1 ' n items (omitting header line)
' [1] get data
v = .Range("A2:D" & n + 1).Value
' [2a]remove column C (i.e. allow columns 1, 2 and 4 only)
v = Application.Index(v, Evaluate("row(1:" & n & ")"), Array(1, 2, 4))
' [2b] check for unique ones
For i = 1 To n
currRow = Join(Application.Index(v, i, 0), ",") ' build string of cells A,B & D
If Not dict.Exists(currRow) Then dict.Add currRow, i
Next i
' [3] remove duplicate rows
v = Application.Transpose(Application.Index(v, dict.Items, Evaluate("row(1:" & 3 & ")")))
' [4] write data to any wanted range
.Range("F:H") = "" ' clear rows
.Range("F2").Resize(UBound(v), 3) = v ' write data
End With
Set dict = Nothing
End Sub
Note
The dict.Items
collection in section [3] is an array of all found item numbers in the dictionary and allows the Index
function to get only these items.
Additional links
See Insert new first column in datafield array without loops or API call
Edit - maintain values in column C
Due to comment: "ONLY using columns A, B, and D; Column C was not including in the criteria."
If you want to check values only in A,B and D, but maintain the C values in the resulting array you can use the following optimized code neglecting an empty values row.
Sub getUniqueRows2()
Dim dict As Object, v, i&, n&, j&, currRow$
Set dict = CreateObject("Scripting.Dictionary") ' late binding dictionary
With ThisWorkbook.Worksheets("MySheet") ' << change to your sheet name
n = .Cells(.Rows.Count, "A").End(xlUp).Row - 1 ' items counter (omitting header line)
' [1] get data
v = .Range("A2:D" & n + 1).Value
' [2] check for unique ones
For i = 1 To UBound(v)
' assign ONLY criteria of 1st, 2nd & 4th column to string value currRow
currRow = ""
For j = 0 To 2: currRow = currRow & v(i, Array(1, 2, 4)(j)) & ",": Next j
' add first unique occurrence to dictionary
If Not dict.Exists(currRow) Then ' add first occurrence
If Len(currRow) > 3 Then dict.Add currRow, i ' ... and ignore empty values
End If
Next i
' [3] remove duplicate rows
v = Application.Transpose(Application.Index(v, dict.Items, Evaluate("row(1:" & 4 & ")")))
' [4] write resulting array values anywhere, e.g. to columns F:I
.Range("F:I") = "" ' clear rows
.Range("F2").Resize(UBound(v), 4) = v ' write data
End With
Set dict = Nothing
End Sub
Upvotes: 0
Reputation: 84465
You can loop rows/columns and use Index
to slice a row/column out of the array and use Match
to test if search value is in that column. Combine with Count
to test for duplicates. If the count equals the number of columns ignore value (or column count -1... see next comment ==>). Not entirely sure about this imaginary column. Do you intend to dimension at start with an additional empty column?
Row Versions:
Exists:
Option Explicit
Public Sub CheckRow()
Dim arr(), i As Long
arr = [A1:D6].Value '<==2D array created
For i = LBound(arr, 1) To UBound(arr, 1) '<== loop rows
'look in each row for x and if found exit loop and indicate row where found
If Not IsError(Application.Match("x", Application.WorksheetFunction.Index(arr, i, 0), 0)) Then
Debug.Print "value found in column " & i
Exit For
End If
Next
End Sub
Duplicates:
Option Explicit
Public Sub CheckRow()
Dim arr(), i As Long
arr = [A1:D6].Value '<==2D array created
For i = LBound(arr, 1) To UBound(arr, 1) '<== loop rows
'look in each row for more than one "B" and if found exit loop and indicate row where found
If Application.Count(Application.Match(Application.WorksheetFunction.Index(arr, i, 0), "B", 0)) > 1 Then
Debug.Print i
Exit For
End If
Next
End Sub
exists:
Columns versions:
Exists:
Option Explicit
Public Sub CheckColumn()
Dim arr(), i As Long
arr = [A1:D6].Value '<==2D array created
For i = LBound(arr, 2) To UBound(arr, 2) '<== loop columns
'look in each column for x and if found exit loop and indicate column where found
If Not IsError(Application.Match("x", Application.WorksheetFunction.Transpose(Application.WorksheetFunction.Index(arr, 0, i)), 0)) Then
Debug.Print "value found in column " & i
Exit For
End If
Next
End Sub
Duplicates:
You can use Count
to check for duplicates within an entire column, again sliced with Index
:
Option Explicit
Public Sub CheckColumn()
Dim arr(), i As Long
arr = [A1:D6].Value '<==2D array created
For i = LBound(arr, 2) To UBound(arr, 2) '<== loop columns
'look in each column for more than one "B" and if found exit loop and indicate column where found
If Application.Count(Application.Match(Application.WorksheetFunction.Transpose(Application.WorksheetFunction.Index(arr, 0, i)), "B", 0)) > 1 Then
Debug.Print i
Exit For
End If
Next
End Sub
Using sample data in sheet:
Upvotes: 1