Minhal
Minhal

Reputation: 85

How to get distinct filenames with different dates in SQL

I have a table with columns filename, projectname and recordinserttime. I want to get the distinct file names in a way that it checks for the first 5 numbers and matches it with other filenames to check for duplicates. If duplicates are present then it will just get the filename once for that file for the same project with the latest date on it.

Example 1: What I am getting now is both the files which are the same so present as duplicates as you can see that both files are inserted on different dates with filename having first 5 digits as same in the filename.

**Filename**                **Projectname**       **RecordInsertTime**
12345_abcde_mnopqrst          Project1              08/10/2020
12345_abcde_rtyusety          Project1              08/12/2020

Example 2:

**Filename**                **Projectname**       **RecordInsertTime**
56789_fghijk_mnopqrst          Project1              08/10/2020
56789_fghijk_rtyusety          Project1              08/12/2020

Desired Output: Desired output is to eliminate the duplicate and get the filename for the latest file (the file which has the latest record insert date on it) for the same project having duplicate filenames.

**Filename**                **Projectname**       **RecordInsertTime**
12345_abcde_mnopqrst          Project1              08/12/2020

My Code:

   Select replace(replace([FileName],'\\project\SSIS ReturnedReport(xyz)\',''),'.xlsx','') as FileName, ProjectName, cast(RecordInsertTime as DATE) as 'RecordInsertTime'from ReturnedFiles where ProjectName is not null

Thanks

Upvotes: 0

Views: 662

Answers (1)

Isaac
Isaac

Reputation: 3363

There are a lot of ways you could do this, but here is my approach. Get the maximum RecordInsertTime for what I am calling the FilenamePrefix and Projectname and then join back to the ReturnedFiles table to get the full details related to that maximum row.

CREATE TABLE ReturnedFiles
(
    Filename VARCHAR(25)
  , Projectname VARCHAR(25)
  , RecordInsertTime DATETIME
);

INSERT INTO ReturnedFiles
VALUES
  ('12345_abcde_mnopqrst', 'Project1', '08/10/2020')
, ('12345_abcde_rtyusety', 'Project1', '08/12/2020')
, ('56789_fghijk_mnopqrst', 'Project1', '08/10/2020')
, ('56789_fghijk_rtyusety', 'Project1', '08/12/2020');

SELECT
           rf.Filename
         , rf.Projectname
         , rf.RecordInsertTime
FROM       (
               SELECT
                        LEFT(Filename, 5)     AS [FilenamePrefix]
                      , Projectname
                      , MAX(RecordInsertTime) AS [MaxRecordInsertTime]
               FROM     ReturnedFiles
               GROUP BY LEFT(Filename, 5)
                      , Projectname
           )              x
INNER JOIN ReturnedFiles rf ON x.FilenamePrefix = LEFT(rf.Filename, 5)
                           AND x.Projectname = rf.Projectname
                           AND x.MaxRecordInsertTime = rf.RecordInsertTime;

Demo.

Upvotes: 1

Related Questions