Talion83
Talion83

Reputation: 47

duplicate items in SQL view/query

I have a view in SQL which is bringing together some data in which I need to eliminate duplicate values. I have tried using DISTINCT and GROUP BY's without success. Basically what we have going on is a series of Uploaded Files which are attached to a Provider based upon the Type of document it is. They will upload multiple versions of the document as it goes through different phases of signatures.

Each time they upload a new phase of the document a new row is added to the UploadedDocuments table, the RequiredDocumentsID remains the same but the Filename in the UploadedFiles table (as well as the ID field in that table) are new.

Historically this hasn't been an issue because we normally look up this information one Provider at a time - in which case we just grab the most recent one for each document type. Now however we have a new page that is being worked on which needs to display ALL of the Providers at once, but it only needs to list each one ONCE and only list the most recent Filename/Path columns.

Below is the view that I have currently. As mentioned I have tried putting the first value as a 'DISTINCT dbo.ReqDocuments.ID' as well as doing a GroupBy. Both of theses have failed to eliminate any of the duplicates. I was thinking an embedded select or OUTER apply, but I my tSQL skills are not quite at that level yet.

SELECT dbo.UploadedFiles.FileName, dbo.UploadedFiles.FilePath,     
dbo.ReqDocuments.ProviderID, dbo.Providers.CompanyName, 
dbo.ReqDocuments.ID AS RequiredDocumentID, dbo.UploadedFiles.aDate,    
dbo.UploadedFiles.aUser
FROM  dbo.Providers 
INNER JOIN dbo.ReqDocuments ON dbo.Providers.ID = dbo.ReqDocuments.ProviderID 
INNER JOIN dbo.UploadedFiles ON dbo.ReqDocuments.ID = dbo.UploadedFiles.ReqDocumentsID
WHERE (dbo.ReqDocuments.DocumentID = 50)

Upvotes: 1

Views: 1152

Answers (3)

John Tseng
John Tseng

Reputation: 6352

Simply put, given a DocumentID, you want a list of (ProviderID, FilePath) where the FilePath is the most recent for that DocumentID and ProviderID combination.

I would rank all of your FilePaths partitioning by ProviderID and ordering by Date:

SELECT outerF.FileName, outerF.FilePath,     
    outerD.ProviderID, outerP.CompanyName, 
    outerD.ID AS RequiredDocumentID, outerF.aDate,    
    outerF.aUser
FROM  dbo.Providers outerP
INNER JOIN dbo.ReqDocuments outerD ON outerP.ID = outerD.ProviderID 
INNER JOIN dbo.UploadedFiles outerF ON outerD.ID = outerF.ReqDocumentsID
WHERE (outerD.DocumentID = 50)
AND outerF.aDate = (
    SELECT top 1 innerF.aDate
    FROM  dbo.ReqDocuments innerD
    INNER JOIN dbo.UploadedFiles innerF ON innerD.ID = innerF.ReqDocumentsID
    WHERE innerD.ProviderID = outerP.id
    AND innerD.DocumentID = outerD.DocumentID
    ORDER BY innerF.aDate DESC)

Upvotes: 1

Dejan Ciev
Dejan Ciev

Reputation: 142

This query find duplicate items

 SELECT t1.ID  FROM Table t1,Table t2  where  t1.Name=t2.Name and t1.ID>t2.ID

Upvotes: 0

Hart CO
Hart CO

Reputation: 34774

You can use a ROW_NUMBER() to solve this:

SELECT *
FROM (SELECT UploadedFiles.FileName, UploadedFiles.FilePath,    
ReqDocuments.ProviderID, Providers.CompanyName, 
dbo.ReqDocuments.ID AS RequiredDocumentID, dbo.UploadedFiles.aDate,    
dbo.UploadedFiles.aUser
  , ROW_NUMBER () OVER (PARTITION BY  ReqDocuments.ProviderID, Providers.CompanyName, ReqDocuments.ID ORDER BY UploadedFiles.aDate DESC) as RowRank
FROM  dbo.Providers 
INNER JOIN dbo.ReqDocuments ON dbo.Providers.ID = dbo.ReqDocuments.ProviderID 
INNER JOIN dbo.UploadedFiles ON dbo.ReqDocuments.ID = dbo.UploadedFiles.ReqDocumentsID
WHERE (dbo.ReqDocuments.DocumentID = 50)
)sub
WHERE RowRank = 1

PARTITION BY the fields that will not change with each upload, and ORDER BY the date descending to show the most recent one. You can run the inside query to get an idea of how the ROW_NUMBER() works.

Also, I like aliases, so here's this:

SELECT *
FROM (SELECT  upl.FILENAME
            , upl.FILEPATH
            , Req.ProviderID
            , prv.CompanyName
            , Req.ID AS RequiredDocumentID
            , upl.aDate
            , upl.aUser
            , ROW_NUMBER () OVER (PARTITION BY  Req.ProviderID, prv.CompanyName, Req.ID ORDER BY upl.aDate DESC) as RowRank
    FROM  Providers prv
    INNER JOIN ReqDocuments Req
            ON prv.ID = Req.ProviderID 
    INNER JOIN UploadedFiles upl
            ON Req.ID = upl.ReqDocumentsID
    WHERE (Req.DocumentID = 50)
    )sub
WHERE RowRank = 1

Upvotes: 1

Related Questions