Reputation: 47
I have a view in SQL which is bringing together some data in which I need to eliminate duplicate values. I have tried using DISTINCT and GROUP BY's without success. Basically what we have going on is a series of Uploaded Files which are attached to a Provider based upon the Type of document it is. They will upload multiple versions of the document as it goes through different phases of signatures.
Each time they upload a new phase of the document a new row is added to the UploadedDocuments table, the RequiredDocumentsID remains the same but the Filename in the UploadedFiles table (as well as the ID field in that table) are new.
Historically this hasn't been an issue because we normally look up this information one Provider at a time - in which case we just grab the most recent one for each document type. Now however we have a new page that is being worked on which needs to display ALL of the Providers at once, but it only needs to list each one ONCE and only list the most recent Filename/Path columns.
Below is the view that I have currently. As mentioned I have tried putting the first value as a 'DISTINCT dbo.ReqDocuments.ID' as well as doing a GroupBy. Both of theses have failed to eliminate any of the duplicates. I was thinking an embedded select or OUTER apply, but I my tSQL skills are not quite at that level yet.
SELECT dbo.UploadedFiles.FileName, dbo.UploadedFiles.FilePath,
dbo.ReqDocuments.ProviderID, dbo.Providers.CompanyName,
dbo.ReqDocuments.ID AS RequiredDocumentID, dbo.UploadedFiles.aDate,
dbo.UploadedFiles.aUser
FROM dbo.Providers
INNER JOIN dbo.ReqDocuments ON dbo.Providers.ID = dbo.ReqDocuments.ProviderID
INNER JOIN dbo.UploadedFiles ON dbo.ReqDocuments.ID = dbo.UploadedFiles.ReqDocumentsID
WHERE (dbo.ReqDocuments.DocumentID = 50)
Upvotes: 1
Views: 1152
Reputation: 6352
Simply put, given a DocumentID, you want a list of (ProviderID, FilePath) where the FilePath is the most recent for that DocumentID and ProviderID combination.
I would rank all of your FilePaths partitioning by ProviderID and ordering by Date:
SELECT outerF.FileName, outerF.FilePath,
outerD.ProviderID, outerP.CompanyName,
outerD.ID AS RequiredDocumentID, outerF.aDate,
outerF.aUser
FROM dbo.Providers outerP
INNER JOIN dbo.ReqDocuments outerD ON outerP.ID = outerD.ProviderID
INNER JOIN dbo.UploadedFiles outerF ON outerD.ID = outerF.ReqDocumentsID
WHERE (outerD.DocumentID = 50)
AND outerF.aDate = (
SELECT top 1 innerF.aDate
FROM dbo.ReqDocuments innerD
INNER JOIN dbo.UploadedFiles innerF ON innerD.ID = innerF.ReqDocumentsID
WHERE innerD.ProviderID = outerP.id
AND innerD.DocumentID = outerD.DocumentID
ORDER BY innerF.aDate DESC)
Upvotes: 1
Reputation: 142
This query find duplicate items
SELECT t1.ID FROM Table t1,Table t2 where t1.Name=t2.Name and t1.ID>t2.ID
Upvotes: 0
Reputation: 34774
You can use a ROW_NUMBER() to solve this:
SELECT *
FROM (SELECT UploadedFiles.FileName, UploadedFiles.FilePath,
ReqDocuments.ProviderID, Providers.CompanyName,
dbo.ReqDocuments.ID AS RequiredDocumentID, dbo.UploadedFiles.aDate,
dbo.UploadedFiles.aUser
, ROW_NUMBER () OVER (PARTITION BY ReqDocuments.ProviderID, Providers.CompanyName, ReqDocuments.ID ORDER BY UploadedFiles.aDate DESC) as RowRank
FROM dbo.Providers
INNER JOIN dbo.ReqDocuments ON dbo.Providers.ID = dbo.ReqDocuments.ProviderID
INNER JOIN dbo.UploadedFiles ON dbo.ReqDocuments.ID = dbo.UploadedFiles.ReqDocumentsID
WHERE (dbo.ReqDocuments.DocumentID = 50)
)sub
WHERE RowRank = 1
PARTITION BY
the fields that will not change with each upload, and ORDER BY
the date descending to show the most recent one. You can run the inside query to get an idea of how the ROW_NUMBER() works.
Also, I like aliases, so here's this:
SELECT *
FROM (SELECT upl.FILENAME
, upl.FILEPATH
, Req.ProviderID
, prv.CompanyName
, Req.ID AS RequiredDocumentID
, upl.aDate
, upl.aUser
, ROW_NUMBER () OVER (PARTITION BY Req.ProviderID, prv.CompanyName, Req.ID ORDER BY upl.aDate DESC) as RowRank
FROM Providers prv
INNER JOIN ReqDocuments Req
ON prv.ID = Req.ProviderID
INNER JOIN UploadedFiles upl
ON Req.ID = upl.ReqDocumentsID
WHERE (Req.DocumentID = 50)
)sub
WHERE RowRank = 1
Upvotes: 1