Reputation:
The salient feature of this problem is trying to get two columns from a row without a second query returning the same row. I've included more information for context.
I have the following (simplified) tables which represent documents sent to us by customers, scanned in batches by users here.
Batches: Id, ... Documents: Id, CustomerId, ... Documents_Batches: Id, BatchId, DocumentId
And document history (Creation, state changes, edits, etc.):
DocumentEvents: Id, DocumentId, UserId, Occurred (datetime)
What I want is list of documents in a given batch, plus some event data:
Result: DocumentId, CustomerId, Created, CreatedBy, ...
How do I get both the Created date, and the CreatedBy value in that same row?
ALTER PROCEDURE [dbo].[sp_GetBatchDocuments]
@BatchId INT
AS
BEGIN
SELECT
Documents.Id,
Documents.CustomerId,
MIN(DocumentEvents.Occurred) AS Created,
/* UserId value of the 'Created' row, AS CreatedBy */
MAX(DocumentEvents.Occurred) AS Modified
/* UserId value of the 'Modfied' row, AS ModifiedBy */
FROM
Documents
INNER JOIN Documents_Batches
ON Documents.Id = Documents_Batches.DocumentId
INNER JOIN DocumentEvents
ON Documents.Id = DocumentEvents.DocumentId
WHERE Documents_Batches.BatchId = @BatchId;
END
Although I could likely get them beforehand, or with a function call, every case I can think of would mean multiple queries of the same row.
EDIT: Barring some surprise idea from SO, I've concluded that this isn't logically possible without a second query to the same row (for each Date/User column pair I want). In order to make this happen, SQL would need a Row-valued (vs. a table-valued) function, and internally, that would need first filter by DocumentId, and then filter that result by the lowest/highest date. No matter the approach, it's two queries. Maybe time to reassess the normalization strategy for this data.
Upvotes: 0
Views: 113
Reputation: 2186
I would do it like this. Two joins are not redundant, they select different information.
SELECT
Documents.Id,
Documents.CustomerId,
MinTable.Created,
MinTable.UserId AS CreatedBy,
MaxTable.Modified,
MaxTable.UserId AS ModifiedBy
FROM
Documents
INNER JOIN Documents_Batches
ON Documents.Id = Documents_Batches.DocumentId
INNER JOIN (SELECT Occurred AS Created, UserId, DocumentId FROM DocumentEvents GROUP BY DocumentId, CustomerId HAVING Occurred = MIN(Occurred)) AS MinTable
ON Documents.Id = MinTable.DocumentId
INNER JOIN (SELECT Occurred AS Modified, UserId, DocumentId FROM DocumentEvents GROUP BY DocumentId, CustomerId HAVING Occurred = MAX(Occurred)) AS MaxTable
ON Documents.Id = MaxTable.DocumentId
WHERE Documents_Batches.BatchId = @BatchId;
Upvotes: 1
Reputation: 3684
Playing with the CTE and the ROW_NUMBER function you can do something like
WITH MinMax AS (
SELECT d.Id
, d.CustomerId
, de.Occurred
, de.UserId
, RowAsc = ROW_NUMBER() OVER (PARTITION BY d.Id, d.CustomerId
ORDER BY de.Occurred)
, RowDesc =ROW_NUMBER() OVER (PARTITION BY d.Id, d.CustomerId
ORDER BY de.Occurred Desc)
FROM Documents d
INNER JOIN Documents_Batches d_b ON d.Id = d_b.DocumentId
INNER JOIN DocumentEvents de ON d.Id = d_e.DocumentId
WHERE d_b.BatchId = @BatchId;
)
SELECT Id, CustomerId
, Created = Max(Case When RowAsc = 1 Then Occurred Else Null End)
, CreatedBy = Max(Case When RowAsc = 1 Then UserId Else Null End)
, Modified = Max(Case When RowDesc = 1 Then Occurred Else Null End)
, ModifiedBy = Max(Case When RowDesc = 1 Then UserId Else Null End)
FROM MinMax
WHERE 1 IN (RowAsc, RowDesc)
GROUP BY Id, CustomerId
In MinMax the row with RowAsc = 1 is the row with the min date and the row with RowDesc = 1 is the row with the max date for the Id, CustomerId group
Upvotes: 3