Reputation: 85
I have a table with columns filename, projectname and recordinserttime. I want to get the distinct file names in a way that it checks for the first 5 numbers and matches it with other filenames to check for duplicates. If duplicates are present then it will just get the filename once for that file for the same project with the latest date on it.
Example 1: What I am getting now is both the files which are the same so present as duplicates as you can see that both files are inserted on different dates with filename having first 5 digits as same in the filename.
**Filename** **Projectname** **RecordInsertTime**
12345_abcde_mnopqrst Project1 08/10/2020
12345_abcde_rtyusety Project1 08/12/2020
Example 2:
**Filename** **Projectname** **RecordInsertTime**
56789_fghijk_mnopqrst Project1 08/10/2020
56789_fghijk_rtyusety Project1 08/12/2020
Desired Output: Desired output is to eliminate the duplicate and get the filename for the latest file (the file which has the latest record insert date on it) for the same project having duplicate filenames.
**Filename** **Projectname** **RecordInsertTime**
12345_abcde_mnopqrst Project1 08/12/2020
My Code:
Select replace(replace([FileName],'\\project\SSIS ReturnedReport(xyz)\',''),'.xlsx','') as FileName, ProjectName, cast(RecordInsertTime as DATE) as 'RecordInsertTime'from ReturnedFiles where ProjectName is not null
Thanks
Upvotes: 0
Views: 662
Reputation: 3363
There are a lot of ways you could do this, but here is my approach. Get the maximum RecordInsertTime for what I am calling the FilenamePrefix and Projectname and then join back to the ReturnedFiles table to get the full details related to that maximum row.
CREATE TABLE ReturnedFiles
(
Filename VARCHAR(25)
, Projectname VARCHAR(25)
, RecordInsertTime DATETIME
);
INSERT INTO ReturnedFiles
VALUES
('12345_abcde_mnopqrst', 'Project1', '08/10/2020')
, ('12345_abcde_rtyusety', 'Project1', '08/12/2020')
, ('56789_fghijk_mnopqrst', 'Project1', '08/10/2020')
, ('56789_fghijk_rtyusety', 'Project1', '08/12/2020');
SELECT
rf.Filename
, rf.Projectname
, rf.RecordInsertTime
FROM (
SELECT
LEFT(Filename, 5) AS [FilenamePrefix]
, Projectname
, MAX(RecordInsertTime) AS [MaxRecordInsertTime]
FROM ReturnedFiles
GROUP BY LEFT(Filename, 5)
, Projectname
) x
INNER JOIN ReturnedFiles rf ON x.FilenamePrefix = LEFT(rf.Filename, 5)
AND x.Projectname = rf.Projectname
AND x.MaxRecordInsertTime = rf.RecordInsertTime;
Demo.
Upvotes: 1