Reputation: 367
I think I’ve ‘over thought’ my problem so I’m starting from the beginning again.
I have two spreadsheets, one is an original database (org_DB) and the second is the updated database (new_DB). The number of rows is around 15,000 for org_DB and 18,000 for new_DB. The number of relevant contiguous columns is exactly 14 in both.
I need a third sheet, the results, that contains only the new_DB entries that are DIFFERENT from the original database, and, all new_DB entries that are ADDITIONAL.
The definition of DIFFERENT is a row with greater than 0 differences.
The definition of ADDITIONAL is a row that has no equivalence.
I think I need the two definitions because ‘new DB’ is longer than ‘org DB’ and all my formulas fail at the end point of org_DB.
The two test sized DB are:
org_DB
Code 1 Code 2 Code 3 Code 4
AA00001 AAGA 1180218 24
AA00007 AAGA 03821787-97 58
AA00008 AAGA 11821260-99 59
AA00009 AAGA 11001017 60
AA00016 AAGA 3801648 67
AA00017 AAGA 3801649 120
AA00018 AAGA 3801692 66
AA00019 AAGA 03821084-61 70
new_DB
Code 1 Code 2 Code 3 Code 4
AA00001 AAGA 1180218 24
AA00008 AAGA 11821260-99 59
AA00009 AAGA 11001015 60
AA00016 AAGA 3801648 67
AA00017 AAGA 3801649 120
AA00018 AAGA 3801692 69
AA00019 AAGA 03821084-61 70
XX00101 XXGA 1234X567X 101
XX00102 XXGB 1234X567X 101
Result DB (the result I am looking for)
AA00009 AAGA 11001015 60
AA00018 AAGA 3801692 69
XX00101 XXGA 1234X567X 101
XX00102 XXGB 1234X567X 101
For row comparison (which works on a row by row basis) I’m using
=if(ArrayFormula(sum(--(new_DB!A2:D2=org_DB!A2:D2)))<>4,"Copy row","Ignore")
To get the result array I’m using
=filter(new_DB,if(ArrayFormula(sum(--(new_DB=org_DB)))<>4)
Problem 1 is that the FILTER condition argument only gets a single formula result from the ArrayFormula so fails with an #N/A - “FILTER has mismatched range sizes. Expected row count: nn, column count: 1. Actual row count: 1, column count: 1.”
Problem 2 is that the ArrayFormula after IF is comparing 1 row and nn columns - which I want. Wrapping the whole function in another ArrayFormula gives even stranger results.
Problem 3. Changing the row comparison function from IF ArrayFormula to SUMPRODUCT produces the wrong result when used in a ArrayFormula wrapper.
I can see that if that if I use this method then the recursion process is likely to be very lengthy - so I've come to accept my method is fundamentally flawed. Should I use a VLOOKUP and FILTER combo for column A. Column A is actually a SKU ID so should always be unique.
Can anyone help please. TIA
Note that org_DB row 3 (AA0007...) is not in the results. Deliberate.
Test sheet here: Test DB Sheet
Upvotes: 0
Views: 679
Reputation: 10796
This will look really ugly really quickly with a lot of columns which is why I'm asking of you have any columns you can limit yourself to.
=ARRAYFORMULA(FILTER(
'New DB'!A2:D10,
ISERROR(MATCH('New DB'!A2:A10 & "|" &
'New DB'!B2:B10 & "|" &
'New DB'!C2:C10 & "|" &
'New DB'!D2:D10,
'Org DB'!$A$2:$A$9 & "|" &
'Org DB'!$B$2:$B$9 & "|" &
'Org DB'!$C$2:$C$9 & "|" &
'Org DB'!$D$2:$D$9,
0))))
This filters the content of New DB by whether the concatenated columns of New DB can be found in Org DB, you can use any other Delimiter if you have | in there.
Upvotes: 0