user164226
user164226

Reputation:

Avoiding a redundant query in a join

The salient feature of this problem is trying to get two columns from a row without a second query returning the same row. I've included more information for context.

I have the following (simplified) tables which represent documents sent to us by customers, scanned in batches by users here.

          Batches: Id, ...
        Documents: Id, CustomerId, ...
Documents_Batches: Id, BatchId, DocumentId

And document history (Creation, state changes, edits, etc.):

   DocumentEvents: Id, DocumentId, UserId, Occurred (datetime)

What I want is list of documents in a given batch, plus some event data:

           Result: DocumentId, CustomerId, Created, CreatedBy, ...

How do I get both the Created date, and the CreatedBy value in that same row?

ALTER PROCEDURE [dbo].[sp_GetBatchDocuments]
@BatchId INT
AS
BEGIN
    SELECT 
        Documents.Id, 
        Documents.CustomerId, 
        MIN(DocumentEvents.Occurred) AS Created,
        /* UserId value of the 'Created' row, AS CreatedBy */
        MAX(DocumentEvents.Occurred) AS Modified
        /* UserId value of the 'Modfied' row, AS ModifiedBy */
    FROM
        Documents 
        INNER JOIN Documents_Batches        
        ON Documents.Id = Documents_Batches.DocumentId
        INNER JOIN DocumentEvents
        ON Documents.Id = DocumentEvents.DocumentId
        WHERE Documents_Batches.BatchId = @BatchId;
END

Although I could likely get them beforehand, or with a function call, every case I can think of would mean multiple queries of the same row.

EDIT: Barring some surprise idea from SO, I've concluded that this isn't logically possible without a second query to the same row (for each Date/User column pair I want). In order to make this happen, SQL would need a Row-valued (vs. a table-valued) function, and internally, that would need first filter by DocumentId, and then filter that result by the lowest/highest date. No matter the approach, it's two queries. Maybe time to reassess the normalization strategy for this data.

Upvotes: 0

Views: 113

Answers (2)

Weltschmerz
Weltschmerz

Reputation: 2186

I would do it like this. Two joins are not redundant, they select different information.

SELECT 
    Documents.Id, 
    Documents.CustomerId, 

    MinTable.Created,
    MinTable.UserId AS CreatedBy,

    MaxTable.Modified,
    MaxTable.UserId AS ModifiedBy
FROM
    Documents 
    INNER JOIN Documents_Batches        
    ON Documents.Id = Documents_Batches.DocumentId

    INNER JOIN (SELECT Occurred AS Created, UserId, DocumentId FROM DocumentEvents GROUP BY DocumentId, CustomerId HAVING Occurred = MIN(Occurred)) AS MinTable
    ON Documents.Id = MinTable.DocumentId

    INNER JOIN (SELECT Occurred AS Modified, UserId, DocumentId FROM DocumentEvents GROUP BY DocumentId, CustomerId HAVING Occurred = MAX(Occurred)) AS MaxTable
    ON Documents.Id = MaxTable.DocumentId

    WHERE Documents_Batches.BatchId = @BatchId;

Upvotes: 1

Serpiton
Serpiton

Reputation: 3684

Playing with the CTE and the ROW_NUMBER function you can do something like

WITH MinMax AS (
SELECT d.Id
     , d.CustomerId
     , de.Occurred
     , de.UserId
     , RowAsc = ROW_NUMBER() OVER (PARTITION BY d.Id, d.CustomerId
                                   ORDER BY de.Occurred)
     , RowDesc =ROW_NUMBER() OVER (PARTITION BY d.Id, d.CustomerId
                                   ORDER BY de.Occurred Desc)
FROM   Documents d
       INNER JOIN Documents_Batches d_b ON d.Id = d_b.DocumentId
       INNER JOIN DocumentEvents de ON d.Id = d_e.DocumentId
WHERE  d_b.BatchId = @BatchId;
)
SELECT Id, CustomerId
     , Created = Max(Case When RowAsc = 1 Then Occurred Else Null End)
     , CreatedBy = Max(Case When RowAsc = 1 Then UserId Else Null End)
     , Modified = Max(Case When RowDesc = 1 Then Occurred Else Null End)
     , ModifiedBy = Max(Case When RowDesc = 1 Then UserId Else Null End)
FROM   MinMax
WHERE  1 IN (RowAsc, RowDesc)
GROUP BY Id, CustomerId

In MinMax the row with RowAsc = 1 is the row with the min date and the row with RowDesc = 1 is the row with the max date for the Id, CustomerId group

Upvotes: 3

Related Questions