Nic
Nic

Reputation: 127

How do I add each row from a spreadsheet to a new workbook only if the row doesn't exist in the new workbook?

Hello I am trying to copy the rows from a spreadsheet to a new workbook but there are a few conditions for it. I am fairly new to VBA and I hope that someone will be able to assist me with this and guide me along. Thank you.

  1. It will only add the row to the most bottom if the row doesn't match with any of the current rows available in the new workbook.

  2. Some of the values in certain columns would change (for example in this case the only columns that will change are (Age, Contact, Email, Occupation) but I would still want to find the closest match in the new workbook and updates it. Two records might have the same combination of Age, Contact, Email, Occupation but not all of the columns will be the same. Example will be that Data A have the same Age and Occupation with Data B but their Email and Contact will be different.

  3. After updating the rows I would like to have a column to show the changes that was made.

Rules that applied

First layer of check

1. Columns that will not change

Name, Age

2. Columns that will change.

Contact, Email, Occupation

Second layer of check

Assuming that two rows have the same Name, Age. It will then look through the columns Contact, Email, Occupation and then matches it with the correct record.

2 out of the 3 columns in Contact, Email, Occupation can be change but at least 1 out of the 3 columns will remain the same and it will be unique from the other record.

This is the default new workbook.

The default new workbook format

Examples of some records being added in the new workbook in the first update from the data to copy table.

Example of some records being added to new workbook

This is the data that I will be copying from into the new workbook. As you can see some fields in the columns value have been changed but I would want to match with the existing row in the new workbook to find the closest match and updates them. This data to copy table

Data to be copy

This is the final version of the new workbook after the update and as you can see it will have a new column "Changes" where it shows the columns that have been changed. For example in the first update the contact number changed from 1234 to 1111. But in the second update (not shown here), it will again change to 4321 so I would like the Changes column to have the record of (Contact : 1234 -> 1111 -> 4321).

Updated new workbook

Upvotes: 1

Views: 198

Answers (1)

DecimalTurn
DecimalTurn

Reputation: 4129

The first problem your are facing here is more of a conceptual problem than a programming problem. The question is how do you decide if 2 rows match or not.

  1. There needs to be a minimum number of columns to match or something like that.

In your example, you matched 2 rows with Alan, 42 years old, engineer, but what if his Occupation was different? Would that still be a match? If yes, what if his age changed? Would that be enough to match with a row in your new workbook? If yes, what do we do if there is 2 people named Alan like we have here?

  1. There could be more than one candidate to match with.

For example, what if you have this data to copy: enter image description here

You can see that it will match 3/5 columns in the first row of the test data and also 3/5 columns in the second row of test data.

enter image description here

So, that's a problem. You need to think of a rule to decide which row should match in that case.

Once you've defined clear rules for matching two rows, then you'll be able to start programming something that apply those rules. (Please edit your question to add those precisions.)

EDIT 1 :

From the edit you've made, there is still one question that remains:

Let's say you still have the same row of data to copy over:

enter image description here

And you see that it matches the Contact column in the first row and the Occupation column in the second row.

enter image description here

We need some sort of priority to choose when there is only one column that match. For example, you could have:

✔ Contact > ✔ Email > ✔ Occupation

Where ">" means "has priority over", and "✔" means that we have a for that column.

So, that would be the fist step. Then, we need to decide what happens if there is more than 2 columns that both match like in this case which is exactly like the one above but both emails match this time:

enter image description here

You then have to decide if

✔ Contact + ✔ Email > ✔ Occupation + ✔ Email

There are different ways to solve this but you could decide that the priority order you defined for above always holds. Basically, using the fact that ✔ Contact > ✔ Email > ✔ Occupation, you could say that has soon has Contact is matched, it will be the prioritized row and if 2 rows match Contact, then we move on to Email. This would yield :

  1. ✔ Contact + ✔ Email + ✔ Occupation (no change)
  2. ✔ Contact + ✔ Email + ❌Occupation
  3. ✔ Contact + ❌Email + ✔ Occupation
  4. ✔ Contact + ❌Email + ❌Occupation
  5. ❌Contact + ✔ Email + ✔ Occupation
  6. ❌Contact + ✔ Email + ❌Occupation
  7. ❌Contact + ❌Email + ✔ Occupation
  8. ❌Contact + ❌Email + ❌Occupation

If you can specify this priority issue, then it would be possible to program this functionnality without too much trouble.


EDIT 2:

Now that you've clarified certain things, here what you could try the following : (Make sure you change the name of the Workbooks, Sheets and Range to fit your needs).

Sub TableJoinTest()

    'Those table columns will have to match for the 2 lines to be a match
    Dim MandatoryHeaders() As Variant
    MandatoryHeaders = Array("Name", "Age")
    
    'Other table columns that could be used to decide of a match if there is 2 rows that match the mandatory columns.
    'These headers will be used to determine which row to match to by order of priority
    Dim OtherHeaders() As Variant
    OtherHeaders = Array("Contact", "Email", "Occupation")

    Dim SourceTableAnchor As Range
    Set SourceTableAnchor = Workbooks("SourceWorkbook.xlsx").Sheets("Sheet1").Range("A1")

    Dim TargetTableAnchor As Range
    Set TargetTableAnchor = Workbooks("TargetWorkbook.xlsx").Sheets("Sheet1").Range("A1")

    TableJoin _
                SourceTableAnchor:=SourceTableAnchor, _
                TargetTableAnchor:=TargetTableAnchor, _
                MandatoryHeaders:=MandatoryHeaders, _
                OtherHeaders:=OtherHeaders, _
                AddIfMissing:=True, _
                IsLogging:=True
    
End Sub



Sub TableJoin( _
                SourceTableAnchor As Range, _
                TargetTableAnchor As Range, _
                MandatoryHeaders As Variant, _
                Optional OtherHeaders As Variant, _
                Optional AddIfMissing As Boolean = False, _
                Optional IsLogging As Boolean = False)
 
    '''''''''''''''''''''''''''''''''''''''
    'Definitions
    '''''''''''''''''''''''''''''''''''''''
    Dim srng As Range, trng As Range
    Set srng = SourceTableAnchor.CurrentRegion
    Set trng = TargetTableAnchor.CurrentRegion
    
    Dim sHeaders As Range, tHeaders As Range
    Set sHeaders = srng.Rows(1)
    Set tHeaders = trng.Rows(1)
    
    'Store in Arrays
    
    Dim sArray() As Variant 'prefix s is for Source
    sArray = ExcludeRows(srng, 1).Value2
    
    Dim tArray() As Variant 'prefix t is for Target
    tArray = ExcludeRows(trng, 1).Value2
    
    Dim sArrayHeader As Variant
    sArrayHeader = sHeaders.Value2
    
    Dim tArrayHeader As Variant
    tArrayHeader = tHeaders.Value2
    
    'Find Column correspondance
    Dim sMandatoryHeadersColumn As Variant
    ReDim sMandatoryHeadersColumn(LBound(MandatoryHeaders) To UBound(MandatoryHeaders))
    Dim tMandatoryHeadersColumn As Variant
    ReDim tMandatoryHeadersColumn(LBound(MandatoryHeaders) To UBound(MandatoryHeaders))
    
    Dim k As Long
    For k = LBound(MandatoryHeaders) To UBound(MandatoryHeaders)
        sMandatoryHeadersColumn(k) = Application.Match(MandatoryHeaders(k), sArrayHeader, 0)
        tMandatoryHeadersColumn(k) = Application.Match(MandatoryHeaders(k), tArrayHeader, 0)
    Next k

    Dim sOtherHeadersColumn As Variant
    ReDim sOtherHeadersColumn(LBound(OtherHeaders) To UBound(OtherHeaders))
    Dim tOtherHeadersColumn As Variant
    ReDim tOtherHeadersColumn(LBound(OtherHeaders) To UBound(OtherHeaders))

    For k = LBound(OtherHeaders) To UBound(OtherHeaders)
        sOtherHeadersColumn(k) = Application.Match(OtherHeaders(k), sArrayHeader, 0)
        tOtherHeadersColumn(k) = Application.Match(OtherHeaders(k), tArrayHeader, 0)
    Next k
    
    
    'Merge mandatory headers into one column (aka the helper column method)
    Dim i As Long, j As Long
    
    Dim sHelperColumn() As Variant
    ReDim sHelperColumn(LBound(sArray, 1) To UBound(sArray, 1), 1 To 1)
    
    For i = LBound(sArray, 1) To UBound(sArray, 1)
        For j = LBound(MandatoryHeaders) To UBound(MandatoryHeaders)
          sHelperColumn(i, 1) = sHelperColumn(i, 1) & sArray(i, sMandatoryHeadersColumn(j))
        Next j
    Next i
    
    Dim tHelperColumn() As Variant
    ReDim tHelperColumn(LBound(tArray, 1) To UBound(tArray, 1), 1 To 1)
    
    For i = LBound(tArray, 1) To UBound(tArray, 1)
        For j = LBound(MandatoryHeaders) To UBound(MandatoryHeaders)
          tHelperColumn(i, 1) = tHelperColumn(i, 1) & tArray(i, tMandatoryHeadersColumn(j))
        Next j
    Next i
    
    'Find all matches
    Dim MatchList() As Variant
    
    Dim LoggingColumn() As String
    ReDim LoggingColumn(LBound(tArray, 1) To UBound(tArray, 1), 1 To 1)
    
    For i = LBound(sArray, 1) To UBound(sArray, 1)
        ReDim MatchList(LBound(tArray, 1) To UBound(tArray, 1))
        For j = LBound(tArray, 1) To UBound(tArray, 1)
            If sHelperColumn(i, 1) = tHelperColumn(j, 1) Then
                MatchList(j) = 1
            End If
        Next j
        
        'Get the row number for the match
        Dim MatchRow As Long
        
        Select Case Application.Sum(MatchList)

        Case Is > 1
        
            'Need to do more matching
            Dim MatchingScoresList() As Long
            ReDim MatchingScoresList(1 To UBound(tArray, 1))
            
            Dim m As Long
            
            For k = LBound(OtherHeaders) To UBound(OtherHeaders)
                For m = LBound(tArray, 1) To UBound(tArray, 1)
                    If tArray(m, sOtherHeadersColumn(k)) = sArray(i, sOtherHeadersColumn(k)) Then
                        MatchingScoresList(m) = MatchingScoresList(m) + 2 ^ (UBound(OtherHeaders) - k)
                    End If
                Next m
            Next k
            
            'Get the max score position
            Dim MyMax As Long
            MyMax = Application.Max(MatchingScoresList)
            If Application.Count(Application.Match(MatchingScoresList(), Array(MyMax), 0)) > 1 Then
                MsgBox "Error: can't determine how to match row " & i & " in source table"
                Exit Sub
            Else
                MatchRow = Application.Match(MyMax, MatchingScoresList, 0)
            End If
            
        Case Is = 1
        
            MatchRow = Application.Match(1, MatchList, 0)
            
        Case Else
            Dim nArray() As Variant, Counter As Long
            If AddIfMissing Then
                MatchRow = 0
                Counter = Counter + 1
                ReDim nArray(1 To Counter, 1 To UBound(tArray, 2))
                For k = LBound(MandatoryHeaders) To UBound(MandatoryHeaders)
                    nArray(Counter, tMandatoryHeadersColumn(k)) = sArray(i, sMandatoryHeadersColumn(k))
                Next k
                For k = LBound(OtherHeaders) To UBound(OtherHeaders)
                    nArray(Counter, tOtherHeadersColumn(k)) = sArray(i, sOtherHeadersColumn(k))
                Next k
            Else
                MsgBox "Error: Couldn't find a match for data row #" & i
                Exit Sub
            End If
        End Select
        
        
        'Logging and assigning values
        If MatchRow > 0 Then
            For k = LBound(OtherHeaders) To UBound(OtherHeaders)
                If tArray(MatchRow, tOtherHeadersColumn(k)) <> sArray(i, sOtherHeadersColumn(k)) Then
                   'Logging
                    If IsLogging Then LoggingColumn(MatchRow, 1) = LoggingColumn(MatchRow, 1) & _
                                                    IIf(LoggingColumn(MatchRow, 1) <> "", ", ", "") & _
                                                    tHeaders.Cells(1, tOtherHeadersColumn(k)) & " : " & _
                                                    tArray(MatchRow, tOtherHeadersColumn(k)) & _
                                                    " -> " & sArray(i, sOtherHeadersColumn(k))
                   'Assign new value
                   tArray(MatchRow, tOtherHeadersColumn(k)) = sArray(i, sOtherHeadersColumn(k))
                End If
            Next k
        End If
        
    Next i
    
    'Write arrays to sheet
    ExcludeRows(trng, 1).Value2 = tArray
    With trng.Parent
        If IsArrayInitialised(nArray) And AddIfMissing Then
            .Cells(trng.Cells(1, 1).Row + trng.Rows.Count, trng.Cells(1, 1).Column).Resize(UBound(nArray, 1), UBound(nArray, 2)).Value2 = nArray
        End If
        If IsLogging Then
            .Cells(trng.Cells(1, 1).Row, trng.Cells(1, 1).Column + trng.Columns.Count) = "Changes"
            .Cells(trng.Cells(2, 1).Row, trng.Cells(1, 1).Column + trng.Columns.Count).Resize(UBound(LoggingColumn, 1), 1).Value2 = LoggingColumn
        End If
    End With

End Sub

Function IsArrayInitialised(ByRef A() As Variant) As Boolean
    On Error Resume Next
    IsArrayInitialised = IsNumeric(UBound(A))
    On Error GoTo 0
End Function

Function ExcludeRows(MyRng As Range, StartRow As Long, Optional EndRow As Long = -1) As Range
'PURPOSE: Exclude one or more consecutives rows from an existing range

Dim Afterpart As Range, BeforePart As Range

If StartRow < 1 Or EndRow > MyRng.Rows.Count Then Set ExcludeRows = Nothing
If StartRow = 1 And EndRow = MyRng.Rows.Count Then Set ExcludeRows = Nothing

If EndRow = -1 Then EndRow = StartRow

    If EndRow < MyRng.Rows.Count Then
        With MyRng.Parent
            Set Afterpart = .Range(MyRng.Cells(EndRow + 1, 1), MyRng.Cells(MyRng.Rows.Count, MyRng.Columns.Count))
        End With
    End If
    
    If StartRow > 1 Then
        With MyRng.Parent
            Set BeforePart = .Range(MyRng.Cells(1, MyRng.Cells(1, 1).Column), MyRng.Cells(StartRow - 1, MyRng.Columns.Count))
        End With
    End If
    
    
    Set ExcludeRows = Union2(True, BeforePart, Afterpart)
        
End Function

Public Function Union2(IgnoreEmptyRange As Boolean, ParamArray RangeArray() As Variant) As Range
'PURPOSE: Samae as Application.Union but allows some range object to be Empty

    Dim V As Variant
    Dim Rng As Range
    For Each V In RangeArray
    Do
        If VarType(V) = vbEmpty Then Exit Do

        Set Rng = V
        
        If Not Union2 Is Nothing Then
            Set Union2 = Union(Union2, Rng)
        ElseIf Not Rng Is Nothing Then
            Set Union2 = Rng
        End If
        
    Loop While False
    Next
    
End Function

Upvotes: 2

Related Questions