Reputation: 11
I have a list of data that has columns for different users, paths as well as request timestamps, in Excel.
Now the challenge is: A request is counted as duplicate when it's created by the same user, for the same path, within 20 minutes of another request.
I have tried the following code: =IF(COUNTIFS($A$2:$A$8,$A2,$B$2:$B$8,$B2)>1, "Duplicate row", "")
Whereby: Column A is for users, Column B is for paths, in order to find those duplicate entries for same users, same paths.
Then I'm stuck with the time factors. What to do next, guys? How to arrive at the final duplicate requests?
Upvotes: 1
Views: 904
Reputation: 11968
You can use SUMPRODUCT function, if data not sorted:
=SUMPRODUCT((A2=$A$2:$A$13)*(B2=$B$2:$B$13)*(C2-1/24/3<$C$2:$C$13)*(C2+1/24/3>$C$2:$C$13))>1
it marks all duplicate rows as TRUE
or, if data is sorted by Time
field ascending:
=SUMPRODUCT((A2=$A$1:A1)*(B2=$B$1:B1)*(C2-1/24/3<$C$1:C1))>0
it marks all but not first duplicate as TRUE
Upvotes: 1
Reputation: 286
You can use the following algorithm
IF (the time-step difference is <20; IF (if true 1) the pattern = pattern; (if true 2) “duplicate”; (if false 2) “no duplicate”; (if false 1) “no duplicate”)
Upvotes: 0