Reputation: 127
Hello I am trying to copy the rows from a spreadsheet to a new workbook but there are a few conditions for it. I am fairly new to VBA and I hope that someone will be able to assist me with this and guide me along. Thank you.
It will only add the row to the most bottom if the row doesn't match with any of the current rows available in the new workbook.
Some of the values in certain columns would change (for example in this case the only columns that will change are (Age, Contact, Email, Occupation) but I would still want to find the closest match in the new workbook and updates it. Two records might have the same combination of Age, Contact, Email, Occupation but not all of the columns will be the same. Example will be that Data A have the same Age and Occupation with Data B but their Email and Contact will be different.
After updating the rows I would like to have a column to show the changes that was made.
Rules that applied
First layer of check
1. Columns that will not change
Name, Age
2. Columns that will change.
Contact, Email, Occupation
Second layer of check
Assuming that two rows have the same Name, Age. It will then look through the columns Contact, Email, Occupation and then matches it with the correct record.
2 out of the 3 columns in Contact, Email, Occupation can be change but at least 1 out of the 3 columns will remain the same and it will be unique from the other record.
This is the default new workbook.
Examples of some records being added in the new workbook in the first update from the data to copy table.
This is the data that I will be copying from into the new workbook. As you can see some fields in the columns value have been changed but I would want to match with the existing row in the new workbook to find the closest match and updates them. This data to copy table
This is the final version of the new workbook after the update and as you can see it will have a new column "Changes" where it shows the columns that have been changed. For example in the first update the contact number changed from 1234 to 1111. But in the second update (not shown here), it will again change to 4321 so I would like the Changes column to have the record of (Contact : 1234 -> 1111 -> 4321).
Upvotes: 1
Views: 198
Reputation: 4129
The first problem your are facing here is more of a conceptual problem than a programming problem. The question is how do you decide if 2 rows match or not.
In your example, you matched 2 rows with Alan, 42 years old, engineer, but what if his Occupation was different? Would that still be a match? If yes, what if his age changed? Would that be enough to match with a row in your new workbook? If yes, what do we do if there is 2 people named Alan like we have here?
For example, what if you have this data to copy:
You can see that it will match 3/5 columns in the first row of the test data and also 3/5 columns in the second row of test data.
So, that's a problem. You need to think of a rule to decide which row should match in that case.
Once you've defined clear rules for matching two rows, then you'll be able to start programming something that apply those rules. (Please edit your question to add those precisions.)
EDIT 1 :
From the edit you've made, there is still one question that remains:
Let's say you still have the same row of data to copy over:
And you see that it matches the Contact column in the first row and the Occupation column in the second row.
We need some sort of priority to choose when there is only one column that match. For example, you could have:
✔ Contact > ✔ Email > ✔ Occupation
Where ">" means "has priority over", and "✔" means that we have a for that column.
So, that would be the fist step. Then, we need to decide what happens if there is more than 2 columns that both match like in this case which is exactly like the one above but both emails match this time:
You then have to decide if
✔ Contact + ✔ Email > ✔ Occupation + ✔ Email
There are different ways to solve this but you could decide that the priority order you defined for above always holds. Basically, using the fact that ✔ Contact > ✔ Email > ✔ Occupation, you could say that has soon has Contact is matched, it will be the prioritized row and if 2 rows match Contact, then we move on to Email. This would yield :
If you can specify this priority issue, then it would be possible to program this functionnality without too much trouble.
EDIT 2:
Now that you've clarified certain things, here what you could try the following : (Make sure you change the name of the Workbooks, Sheets and Range to fit your needs).
Sub TableJoinTest()
'Those table columns will have to match for the 2 lines to be a match
Dim MandatoryHeaders() As Variant
MandatoryHeaders = Array("Name", "Age")
'Other table columns that could be used to decide of a match if there is 2 rows that match the mandatory columns.
'These headers will be used to determine which row to match to by order of priority
Dim OtherHeaders() As Variant
OtherHeaders = Array("Contact", "Email", "Occupation")
Dim SourceTableAnchor As Range
Set SourceTableAnchor = Workbooks("SourceWorkbook.xlsx").Sheets("Sheet1").Range("A1")
Dim TargetTableAnchor As Range
Set TargetTableAnchor = Workbooks("TargetWorkbook.xlsx").Sheets("Sheet1").Range("A1")
TableJoin _
SourceTableAnchor:=SourceTableAnchor, _
TargetTableAnchor:=TargetTableAnchor, _
MandatoryHeaders:=MandatoryHeaders, _
OtherHeaders:=OtherHeaders, _
AddIfMissing:=True, _
IsLogging:=True
End Sub
Sub TableJoin( _
SourceTableAnchor As Range, _
TargetTableAnchor As Range, _
MandatoryHeaders As Variant, _
Optional OtherHeaders As Variant, _
Optional AddIfMissing As Boolean = False, _
Optional IsLogging As Boolean = False)
'''''''''''''''''''''''''''''''''''''''
'Definitions
'''''''''''''''''''''''''''''''''''''''
Dim srng As Range, trng As Range
Set srng = SourceTableAnchor.CurrentRegion
Set trng = TargetTableAnchor.CurrentRegion
Dim sHeaders As Range, tHeaders As Range
Set sHeaders = srng.Rows(1)
Set tHeaders = trng.Rows(1)
'Store in Arrays
Dim sArray() As Variant 'prefix s is for Source
sArray = ExcludeRows(srng, 1).Value2
Dim tArray() As Variant 'prefix t is for Target
tArray = ExcludeRows(trng, 1).Value2
Dim sArrayHeader As Variant
sArrayHeader = sHeaders.Value2
Dim tArrayHeader As Variant
tArrayHeader = tHeaders.Value2
'Find Column correspondance
Dim sMandatoryHeadersColumn As Variant
ReDim sMandatoryHeadersColumn(LBound(MandatoryHeaders) To UBound(MandatoryHeaders))
Dim tMandatoryHeadersColumn As Variant
ReDim tMandatoryHeadersColumn(LBound(MandatoryHeaders) To UBound(MandatoryHeaders))
Dim k As Long
For k = LBound(MandatoryHeaders) To UBound(MandatoryHeaders)
sMandatoryHeadersColumn(k) = Application.Match(MandatoryHeaders(k), sArrayHeader, 0)
tMandatoryHeadersColumn(k) = Application.Match(MandatoryHeaders(k), tArrayHeader, 0)
Next k
Dim sOtherHeadersColumn As Variant
ReDim sOtherHeadersColumn(LBound(OtherHeaders) To UBound(OtherHeaders))
Dim tOtherHeadersColumn As Variant
ReDim tOtherHeadersColumn(LBound(OtherHeaders) To UBound(OtherHeaders))
For k = LBound(OtherHeaders) To UBound(OtherHeaders)
sOtherHeadersColumn(k) = Application.Match(OtherHeaders(k), sArrayHeader, 0)
tOtherHeadersColumn(k) = Application.Match(OtherHeaders(k), tArrayHeader, 0)
Next k
'Merge mandatory headers into one column (aka the helper column method)
Dim i As Long, j As Long
Dim sHelperColumn() As Variant
ReDim sHelperColumn(LBound(sArray, 1) To UBound(sArray, 1), 1 To 1)
For i = LBound(sArray, 1) To UBound(sArray, 1)
For j = LBound(MandatoryHeaders) To UBound(MandatoryHeaders)
sHelperColumn(i, 1) = sHelperColumn(i, 1) & sArray(i, sMandatoryHeadersColumn(j))
Next j
Next i
Dim tHelperColumn() As Variant
ReDim tHelperColumn(LBound(tArray, 1) To UBound(tArray, 1), 1 To 1)
For i = LBound(tArray, 1) To UBound(tArray, 1)
For j = LBound(MandatoryHeaders) To UBound(MandatoryHeaders)
tHelperColumn(i, 1) = tHelperColumn(i, 1) & tArray(i, tMandatoryHeadersColumn(j))
Next j
Next i
'Find all matches
Dim MatchList() As Variant
Dim LoggingColumn() As String
ReDim LoggingColumn(LBound(tArray, 1) To UBound(tArray, 1), 1 To 1)
For i = LBound(sArray, 1) To UBound(sArray, 1)
ReDim MatchList(LBound(tArray, 1) To UBound(tArray, 1))
For j = LBound(tArray, 1) To UBound(tArray, 1)
If sHelperColumn(i, 1) = tHelperColumn(j, 1) Then
MatchList(j) = 1
End If
Next j
'Get the row number for the match
Dim MatchRow As Long
Select Case Application.Sum(MatchList)
Case Is > 1
'Need to do more matching
Dim MatchingScoresList() As Long
ReDim MatchingScoresList(1 To UBound(tArray, 1))
Dim m As Long
For k = LBound(OtherHeaders) To UBound(OtherHeaders)
For m = LBound(tArray, 1) To UBound(tArray, 1)
If tArray(m, sOtherHeadersColumn(k)) = sArray(i, sOtherHeadersColumn(k)) Then
MatchingScoresList(m) = MatchingScoresList(m) + 2 ^ (UBound(OtherHeaders) - k)
End If
Next m
Next k
'Get the max score position
Dim MyMax As Long
MyMax = Application.Max(MatchingScoresList)
If Application.Count(Application.Match(MatchingScoresList(), Array(MyMax), 0)) > 1 Then
MsgBox "Error: can't determine how to match row " & i & " in source table"
Exit Sub
Else
MatchRow = Application.Match(MyMax, MatchingScoresList, 0)
End If
Case Is = 1
MatchRow = Application.Match(1, MatchList, 0)
Case Else
Dim nArray() As Variant, Counter As Long
If AddIfMissing Then
MatchRow = 0
Counter = Counter + 1
ReDim nArray(1 To Counter, 1 To UBound(tArray, 2))
For k = LBound(MandatoryHeaders) To UBound(MandatoryHeaders)
nArray(Counter, tMandatoryHeadersColumn(k)) = sArray(i, sMandatoryHeadersColumn(k))
Next k
For k = LBound(OtherHeaders) To UBound(OtherHeaders)
nArray(Counter, tOtherHeadersColumn(k)) = sArray(i, sOtherHeadersColumn(k))
Next k
Else
MsgBox "Error: Couldn't find a match for data row #" & i
Exit Sub
End If
End Select
'Logging and assigning values
If MatchRow > 0 Then
For k = LBound(OtherHeaders) To UBound(OtherHeaders)
If tArray(MatchRow, tOtherHeadersColumn(k)) <> sArray(i, sOtherHeadersColumn(k)) Then
'Logging
If IsLogging Then LoggingColumn(MatchRow, 1) = LoggingColumn(MatchRow, 1) & _
IIf(LoggingColumn(MatchRow, 1) <> "", ", ", "") & _
tHeaders.Cells(1, tOtherHeadersColumn(k)) & " : " & _
tArray(MatchRow, tOtherHeadersColumn(k)) & _
" -> " & sArray(i, sOtherHeadersColumn(k))
'Assign new value
tArray(MatchRow, tOtherHeadersColumn(k)) = sArray(i, sOtherHeadersColumn(k))
End If
Next k
End If
Next i
'Write arrays to sheet
ExcludeRows(trng, 1).Value2 = tArray
With trng.Parent
If IsArrayInitialised(nArray) And AddIfMissing Then
.Cells(trng.Cells(1, 1).Row + trng.Rows.Count, trng.Cells(1, 1).Column).Resize(UBound(nArray, 1), UBound(nArray, 2)).Value2 = nArray
End If
If IsLogging Then
.Cells(trng.Cells(1, 1).Row, trng.Cells(1, 1).Column + trng.Columns.Count) = "Changes"
.Cells(trng.Cells(2, 1).Row, trng.Cells(1, 1).Column + trng.Columns.Count).Resize(UBound(LoggingColumn, 1), 1).Value2 = LoggingColumn
End If
End With
End Sub
Function IsArrayInitialised(ByRef A() As Variant) As Boolean
On Error Resume Next
IsArrayInitialised = IsNumeric(UBound(A))
On Error GoTo 0
End Function
Function ExcludeRows(MyRng As Range, StartRow As Long, Optional EndRow As Long = -1) As Range
'PURPOSE: Exclude one or more consecutives rows from an existing range
Dim Afterpart As Range, BeforePart As Range
If StartRow < 1 Or EndRow > MyRng.Rows.Count Then Set ExcludeRows = Nothing
If StartRow = 1 And EndRow = MyRng.Rows.Count Then Set ExcludeRows = Nothing
If EndRow = -1 Then EndRow = StartRow
If EndRow < MyRng.Rows.Count Then
With MyRng.Parent
Set Afterpart = .Range(MyRng.Cells(EndRow + 1, 1), MyRng.Cells(MyRng.Rows.Count, MyRng.Columns.Count))
End With
End If
If StartRow > 1 Then
With MyRng.Parent
Set BeforePart = .Range(MyRng.Cells(1, MyRng.Cells(1, 1).Column), MyRng.Cells(StartRow - 1, MyRng.Columns.Count))
End With
End If
Set ExcludeRows = Union2(True, BeforePart, Afterpart)
End Function
Public Function Union2(IgnoreEmptyRange As Boolean, ParamArray RangeArray() As Variant) As Range
'PURPOSE: Samae as Application.Union but allows some range object to be Empty
Dim V As Variant
Dim Rng As Range
For Each V In RangeArray
Do
If VarType(V) = vbEmpty Then Exit Do
Set Rng = V
If Not Union2 Is Nothing Then
Set Union2 = Union(Union2, Rng)
ElseIf Not Rng Is Nothing Then
Set Union2 = Rng
End If
Loop While False
Next
End Function
Upvotes: 2