Reputation: 9
I want to use sqlite to number rows in a csv file by two criteria: Duplicate Id number and Created Date. (I am analyzing duplicate account data generated by DemandTools.) Just to help explain, here is a simplified version of what I want to do:
Want to turn this:
Dupe Id # | Account Name | Created Date |
1882 | A1, Inc. | 03/15/2015 |
1567 | Joe's Plumbing | 08/01/2019 |
1567 | Joes plubming | 02/07/2020 |
1882 | A1 Corporation | 06/20/2019 |
1882 | A1 Incorporated | 05/16/2016 |
Into this:
Dupe Id # | Account Name | Created Date | Dupe # |
1567 | Joe's Plumbing | 08/01/2019 | 0 |
1567 | Joes plubming | 02/07/2020 | 1 |
1882 | A1, Inc. | 03/15/2015 | 0 |
1882 | A1 Incorporated | 05/16/2016 | 1 |
1882 | A1 Corporation | 06/20/2019 | 2 |
I want the original account to have a value of 0, the first dupe 1, 2nd dupe 2, etc.
I had a way to do this in excel using a multi column sort and a countif function like this =COUNTIF(A$2:A2,A2)-1, but what was working in excel for 100k rows and 10 columns, is not working with 700k rows and 24 columns.
My SQLite knowledge is currently at a beginner level. I understand the basics, but not really sure where to begin with a problem like this. I know how to sort by a single column in SQLite, but I don't know how to deal with the countif part of the problem (and maybe there is a better way with SQLite?).
Any help is much appreciated . . . .
Thanks
Upvotes: 0
Views: 91
Reputation: 164139
First thing you must do is change the format of your dates to YYYY-MM-DD
because this is the only valid format for dates in SQLite and it is comparable.
With ROW_NUMBER()
window function:
select *,
row_number() over (partition by Id order by CreatedDate) - 1 Dupe
from tablename
order by id, Dupe
See the demo.
Results:
| Id | AccountName | CreatedDate | Dupe |
| ---- | --------------- | ----------- | ---- |
| 1567 | Joe's Plumbing | 2019-08-01 | 0 |
| 1567 | Joes plubming | 2020-02-07 | 1 |
| 1882 | A1, Inc. | 2015-03-15 | 0 |
| 1882 | A1 Incorporated | 2016-05-16 | 1 |
| 1882 | A1 Corporation | 2019-06-20 | 2 |
Upvotes: 1