Stephanus DJ
Stephanus DJ

Reputation: 109

How to join 2 tables together in a specific order when all join criteria match with TSQL

The problem i have is like this. I need to create a view that has the following information on it

Order Number, Order Line, Packing Slip Number, Invoice Number

There is more information but the above 4 are the ones that are the problem.

The system as it exists has no connection from packing slip to invoice but both packing slips and invoices are connected to order lines.

So lets say I have this information in the packing slip table

Order1 Line1 PackingSlip1 20
Order1 Line1 PackingSlip2 30
Order1 Line2 PackingSlip3 25
Order1 Line2 PackingSlip4 25

And this information in the invoice table

Order1 Line1 Invoice1 20
Order1 Line1 Invoice2 30
Order1 Line2 Invoice3 25
Order1 Line2 Invoice4 25

There are basically 2 scenarios here. The first one is easy because we can join the PackingSlip table to the InvoiceTable on quantities, order numbers and order lines. The issue is that if we do that the 2nd scenario (where the quantity is the same) would have 4 lines as output instead of just 2.

So my question is. How can I join these 2 tables together so that I match the order number and line. And then find the first matching quantity and join on that row. So I would want PackingSlip3 to match to Invoice3 and PackingSlip4 to match to Invoice4.

As additional information the packing table has a packing date and the invoice table has an invoice date. So I would like to join the earliest packed order,line to the earliest invoice order,line.

Any help would be greatly appreciated. Thank you

Edit. here is some sample code for the time being

Declare @InvoiceTable as Table(
    OrderNumber nvarchar(10),
    OrderLine   int,
    InvoiceNumber nvarchar(10),
    QuantityInvoiced int
)

Declare @PackingTable as Table(
    OrderNumber nvarchar(10),
    OrderLine   int,
    PackingNumber nvarchar(10),
    QuantityPacked int
)

Insert into @PackingTable(OrderNumber, OrderLine, PackingNumber, QuantityPacked) Values ('O1', 1, 'P1', 20),('O1', 1, 'P2', 30), ('O1', 2, 'P3', 25), ('O1', 2, 'P4', 25)
Insert into @InvoiceTable(OrderNumber, OrderLine, InvoiceNumber, QuantityInvoiced) Values ('O1', 1, 'I1', 20),('O1', 1, 'I2', 30), ('O1', 2, 'I3', 25), ('O1', 2, 'I4', 25);

with ctePackingSlip as
(
    select distinct OrderNumber, OrderLine, PackingNumber, QuantityPacked
    from @PackingTable
), cteInvoice as
(
    select distinct OrderNumber, OrderLine, InvoiceNumber, QuantityInvoiced
    from @InvoiceTable
)
SELECT t0.OrderNumber, t0.OrderLine, PackingNumber, InvoiceNumber, t0.QuantityPacked
FROM ctePackingSlip As t0
JOIN @InvoiceTable As t1
    ON t0.OrderNumber = t1.OrderNumber 
   AND t0.OrderLine = t1.OrderLine
   AND t0.QuantityPacked = t1.QuantityInvoiced

The problem is this outputs the following

O1 1 P1 I1 20

O1 1 P2 I2 30

O1 2 P3 I3 25

O1 2 P4 I3 25

O1 2 P3 I4 25

O1 2 P4 I4 25

As you can see at the end there is 2 lines for P3 and P4. I only want 1 line for each

Upvotes: 2

Views: 72

Answers (1)

Aaron Dietz
Aaron Dietz

Reputation: 10277

This is very important:

As additional information the packing table has a packing date and the invoice table has an invoice date. So I would like to join the earliest packed order,line to the earliest invoice order,line.

Assign the rows in each group a row number based on the packing/invoice dates, then add that to the join:

with ctePacking as
(
 SELECT OrderNumber, OrderLine, PackingSlipNumber, Quantity,
        ROW_NUMBER() OVER (PARTITION BY OrderNumber, OrderLine ORDER BY PackingDate ASC) RN
 FROM PackingSlip
)
, cteInvoice as
(
 SELECT OrderNumber, OrderLine, InvoiceNumber, Quantity,
        ROW_NUMBER() OVER (PARTITION BY OrderNumber, OrderLine ORDER BY InvoiceDate ASC) RN
 FROM Invoice
)

SELECT P.OrderNumber, P.OrderLine, P.PackingSlipNumber, I.InvoiceNumber
FROM ctePacking P
JOIN cteInvoice I ON P.OrderNumber = I.OrderNumber
              AND P.OrderLine = I.OrderLine
              AND P.RN = I.RN

Edit - Some explanation:

The idea here is that you know the times of the entries relate to each other (first packing slip entry = first invoice entry), but I assume those times can vary slightly so you can't join directly on them.

To solve this, we can number them by their ordered time, then join on that number.

ROW_NUMBER() is a window function that will assign a number to every row (as a new column) that we can then join on. The basic syntax is this:

ROW_NUMBER() OVER (PARTITION BY columnsInGroup ORDER BY columnsToOrderBy)

See here for more information on ROW_NUMBER().

In your case, we want to order by the datetime, and group by each order number and order line. If orders are always inserted perfectly one after the other, we could maybe remove the partition grouping altogether, but it may be safer to have, so:

ROW_NUMBER() OVER (PARTITION BY OrderNumber, OrderLine ORDER BY OrderDate) RN --RN is an alias for the new column

Now the easiest way to implement adding a row number column for us to reference in the main query is to use a CTE. This lets us create the column, alias it, and already have it available for the final query. An alternative option would be to use derived tables like this:

SELECT P.OrderNumber, P.OrderLine, P.PackingSlipNumber, I.InvoiceNumber
FROM (SELECT OrderNumber, OrderLine, PackingSlipNumber, Quantity,
            ROW_NUMBER() OVER (PARTITION BY OrderNumber, OrderLine ORDER BY PackingDate ASC) RN
     FROM PackingSlip) P
JOIN (SELECT OrderNumber, OrderLine, InvoiceNumber, Quantity,
            ROW_NUMBER() OVER (PARTITION BY OrderNumber, OrderLine ORDER BY InvoiceDate ASC) RN
     FROM Invoice) I ON P.OrderNumber = I.OrderNumber
                    AND P.OrderLine = I.OrderLine
                    AND P.RN = I.RN

For most people, CTEs are a much cleaner way to separate the logic and leave your final query simpler.

So to sum it up, we are using the CTEs to add row numbers to the base table data, then we are querying the CTEs instead of the base tables to access and use the new RN column in our join.

Upvotes: 2

Related Questions