DeeKay789
DeeKay789

Reputation: 367

Compare rows between two sheets - Function

I think I’ve ‘over thought’ my problem so I’m starting from the beginning again.

I have two spreadsheets, one is an original database (org_DB) and the second is the updated database (new_DB). The number of rows is around 15,000 for org_DB and 18,000 for new_DB. The number of relevant contiguous columns is exactly 14 in both.

I need a third sheet, the results, that contains only the new_DB entries that are DIFFERENT from the original database, and, all new_DB entries that are ADDITIONAL.

The definition of DIFFERENT is a row with greater than 0 differences.

The definition of ADDITIONAL is a row that has no equivalence.

I think I need the two definitions because ‘new DB’ is longer than ‘org DB’ and all my formulas fail at the end point of org_DB.

The two test sized DB are:

org_DB

Code 1    Code 2    Code 3    Code 4
AA00001    AAGA    1180218        24
AA00007    AAGA    03821787-97    58
AA00008    AAGA    11821260-99    59
AA00009    AAGA    11001017       60
AA00016    AAGA    3801648        67
AA00017    AAGA    3801649       120
AA00018    AAGA    3801692        66
AA00019    AAGA    03821084-61    70

new_DB

Code 1    Code 2    Code 3    Code 4
AA00001    AAGA    1180218        24
AA00008    AAGA    11821260-99    59
AA00009    AAGA    11001015       60
AA00016    AAGA    3801648        67
AA00017    AAGA    3801649       120
AA00018    AAGA    3801692        69
AA00019    AAGA    03821084-61    70
XX00101    XXGA    1234X567X     101
XX00102    XXGB    1234X567X     101

Result DB (the result I am looking for)

AA00009    AAGA    11001015    60
AA00018    AAGA    3801692     69
XX00101    XXGA    1234X567X  101
XX00102    XXGB    1234X567X  101

For row comparison (which works on a row by row basis) I’m using

=if(ArrayFormula(sum(--(new_DB!A2:D2=org_DB!A2:D2)))<>4,"Copy row","Ignore")

To get the result array I’m using

=filter(new_DB,if(ArrayFormula(sum(--(new_DB=org_DB)))<>4)

Problem 1 is that the FILTER condition argument only gets a single formula result from the ArrayFormula so fails with an #N/A - “FILTER has mismatched range sizes. Expected row count: nn, column count: 1. Actual row count: 1, column count: 1.”

Problem 2 is that the ArrayFormula after IF is comparing 1 row and nn columns - which I want. Wrapping the whole function in another ArrayFormula gives even stranger results.

Problem 3. Changing the row comparison function from IF ArrayFormula to SUMPRODUCT produces the wrong result when used in a ArrayFormula wrapper.

I can see that if that if I use this method then the recursion process is likely to be very lengthy - so I've come to accept my method is fundamentally flawed. Should I use a VLOOKUP and FILTER combo for column A. Column A is actually a SKU ID so should always be unique.

Can anyone help please. TIA

Note that org_DB row 3 (AA0007...) is not in the results. Deliberate.

Test sheet here: Test DB Sheet

Upvotes: 0

Views: 679

Answers (1)

Robin Gertenbach
Robin Gertenbach

Reputation: 10796

This will look really ugly really quickly with a lot of columns which is why I'm asking of you have any columns you can limit yourself to.

=ARRAYFORMULA(FILTER(
  'New DB'!A2:D10,
  ISERROR(MATCH('New DB'!A2:A10 & "|" & 
                'New DB'!B2:B10 & "|" &
                'New DB'!C2:C10 & "|" &
                'New DB'!D2:D10, 
                'Org DB'!$A$2:$A$9 & "|" & 
                'Org DB'!$B$2:$B$9 & "|" &
                'Org DB'!$C$2:$C$9 & "|" & 
                'Org DB'!$D$2:$D$9,
                0))))

This filters the content of New DB by whether the concatenated columns of New DB can be found in Org DB, you can use any other Delimiter if you have | in there.

Upvotes: 0

Related Questions