Reputation: 13
I have a table that's multiple thousand rows long, and I'd like to find duplicate rows. I've used countifs(), and it's functional, but it's also very resource-intensive, and I'd like to find a way to do this in a less expensive way.
An example of the data I'd like to check is as follows
Date Product Name Quantity Delivered to
10-1-15 Toilet Paper 1 Health Science
10-1-15 Paper Towel 1 Health Science
10-1-15 Can Liners 2 Health Science
10-2-15 Toilet Paper 2 Administration
10-2-15 Paper Towel 1 Health Science
The formula I've used is
=(countifs(A:A,A2,B:B,B2,C:C,C2,D:D,D2))
but this takes multiple minutes to evaluate with the amount of data I have. I'd like to set up a formula that only compares rows with the same date, so earlier rows don't have to recalculate, and later ones don't have to account for earlier additions.
The purpose behind this is to ensure that users of this sheet don't double-enter lines. I do a manual check for duplicates every week, as this is a tool for data mining our paper sign-out logs from a storage warehouse (we're aware it's an outdated system, but it's what we have to work with for now), but it'd make it easier to check my data if the sheet could alert me to possible duplications (especially without it taking 3 to 5 minutes to enter each line).
In addition, I'd prefer to use a formula, rather than VBA, as I have a hard enough time training people to use tools on a network drive without a scary warning about unsafe scripts.
Upvotes: 1
Views: 5046
Reputation: 6659
Formula - Find duplicates
Assuming the data is located in the range A1:D1017
(change as required)
The solutions I propose include the use of two working fields as follows:
A. In E1
enter Key
and in E2
this formula to concatenate all the fields to be validated, then copy the formula till the last record
=CONCATENATE( A2 , CHAR(133) , B2 , CHAR(133) , C2 , CHAR(133) , D2 )
CHAR(133)
character use as separator, omit or change it as required.
B. Second field, called !Chk
, used to flag the Duplicated
records. However, as this solution includes three options, we’ll have for this purpose 3 fields Chk.1
, Chk.2
and Chk.3
in cells F1
, G1
and H1
respectively.
Flag all Duplicated
records (including original)
Enter this formula in cell F2
and copy till last record
=IF( COUNTIF( $E$1:$E$1017, $E2 ) = 1, "", "Duplicated" )
Flag only Duplicated
records, leaves unmarked the first record found.
=IF( COUNTIF( $E$1:$E2, $E2 ) = 1, "", "Duplicated" )
Flag only Duplicated
records, flag the firs record found as Original
.
=IF( COUNTIF( $E$1:$E$1017, $E2 ) = 1, "",
IF( COUNTIF( $E$1:$E2, $E2 ) = 1, "Original", "Duplicated" ) )
Hide column F
if preferable
Upvotes: 1
Reputation: 13
I wrote a terrible bit of excel formula that does the job. It's nearly unreadable, but it doesn't recalculate based on the entire table anymore. I've entered this formula into a column to the right of my data entry, and it puts "!!" into the field if this row duplicates a previous one. It also checks that I've entered a complete row of data, to prevent it evaluating as I'm entering data.
=IF(
COUNTA(A1017:D1017)=4,
IF(
COUNTIFS(
INDIRECT(IF(ROW(<26,"A2:A"&ROW()+25,"A"&ROW()-25&":A"&ROW()+25)),$A1017,
INDIRECT(IF(ROW(<26,"B2:B"&ROW()+25,"B"&ROW()-25&":B"&ROW()+25)),$B1017,
INDIRECT(IF(ROW(<26,"C2:C"&ROW()+25,"C"&ROW()-25&":C"&ROW()+25)),$C1017,
INDIRECT(IF(ROW(<26,"D2:D"&ROW()+25,"D"&ROW()-25&":D"&ROW()+25)),$D1017
)
>1,"!!","")
,"")
What it does is it checks the 25 rows above and below for duplicates, as 25 items is larger than the maximum number of entries added in any one day. The upshot is that this takes less than a second to indicate whether I've entered a duplicate line.
I'm going to try the solution posited by pnuts in the comments
You might get more speed if you check against a PivotTable filtered for the newly entered date, rather than against your raw data.
This seems like a much more elegant solution, even though it doesn't indicate which line is the culprit, it does give me the data necessary to find the offending line.
Upvotes: 0