Phil Klein
Phil Klein

Reputation: 7514

How to most efficiently join two tables?

I have two tables which store Amounts and Adjustments for LineItemTypes of a specific ReportingPeriod. I am looking for the most efficient way to query the Amount and Adjustment for each ReportingPeriod/LineItemType combination that exists across the two tables.

Schemas are present below:

@ReportingPeriodComposition (1030 rows - Table Variable)

Src int,
GroupReportingPeriodId int,
ReportingPeriodId int,
ClientId int,
PeriodDate date,
PRIMARY KEY CLUSTERED (Src, ReportingPeriodId)

Amount (~30,000,000 rows)

ReportingPeriodId int,
LineItemTypeId smallint,
Amount decimal,
PRIMARY KEY CLUSTERED (ReportingPeriodId, LineItemTypeId)

Adjustment (~180,000 rows)

ReportingPeriodId int,
LineItemTypeId smallint,
Amount decimal,
Comment nvarchar(2500),
AdjustmentId int,
UNIQUE KEY CLUSTERED (ReportingPeriodId, LineItemTypeId)

I would like to select the Amounts and Adjustments by unique ReportingPeriodId/LineItemTypeId yielding the following result set:

| ReportingPeriodId | LineItemTypeId | Amount | Adjustment |

Currently I am using the following query, but I am curious to see if anyone has thoughts on how this can be done more efficiently. All suggestions welcome!

    COALESCE(a.LineItemTypeId, adj.LineItemTypeId) LineItemTypeId,
    adj.Amount Adjustment
FROM @ReportingPeriodComposition rpc
LEFT JOIN Watchlist.risk.Amount a
    ON rpc.ReportingPeriodId = a.ReportingPeriodId
LEFT JOIN Watchlist.risk.Adjustment adj
    ON rpc.ReportingPeriodId = adj.ReportingPeriodId
    AND (a.ReportingPeriodId IS NULL OR a.LineItemTypeId = adj.LineItemTypeId)
    Src = @Src
    AND (a.LineItemTypeId IS NOT NULL OR adj.LineItemTypeId IS NOT NULL)

Note that the @Src variable is necessary to determine which source values we need to pull from the @ReportingPeriodComposition table variable. The query results in ~138,000 rows:

Execution Plan XML

<?xml version="1.0" encoding="utf-16"?>
<ShowPlanXML xmlns:xsi="" xmlns:xsd="" Version="1.1" Build="10.0.4064.0" xmlns="">
        <StmtSimple StatementCompId="9" StatementEstRows="104.769" StatementId="5" StatementOptmLevel="FULL" StatementOptmEarlyAbortReason="GoodEnoughPlanFound" StatementSubTreeCost="0.343989" StatementText="SELECT&#xD;&#xA;    rpc.ReportingPeriodId,&#xD;&#xA;    COALESCE(a.LineItemTypeId, adj.LineItemTypeId) LineItemTypeId,&#xD;&#xA;    a.Amount,&#xD;&#xA; adj.Amount Adjustment&#xD;&#xA;FROM @ReportingPeriodComposition rpc&#xD;&#xA;LEFT JOIN Rating.risk.Amount a&#xD;&#xA;   ON rpc.ReportingPeriodId = a.ReportingPeriodId&#xD;&#xA;LEFT JOIN Rating.risk.Adjustment adj&#xD;&#xA;  ON rpc.ReportingPeriodId = adj.ReportingPeriodId&#xD;&#xA;  AND (a.ReportingPeriodId IS NULL OR a.LineItemTypeId = adj.LineItemTypeId)&#xD;&#xA;WHERE&#xD;&#xA; Src = @Src&#xD;&#xA;    AND (a.LineItemTypeId IS NOT NULL OR adj.LineItemTypeId IS NOT NULL)" StatementType="SELECT" QueryHash="0x425781A4C1D20919" QueryPlanHash="0xF3E9DD0ADAD04044">
          <QueryPlan DegreeOfParallelism="1" CachedPlanSize="24" CompileTime="5" CompileCPU="5" CompileMemory="424">
            <RelOp AvgRowSize="31" EstimateCPU="1.04769E-05" EstimateIO="0" EstimateRebinds="0" EstimateRewinds="0" EstimateRows="104.769" LogicalOp="Compute Scalar" NodeId="0" Parallel="false" PhysicalOp="Compute Scalar" EstimatedTotalSubtreeCost="0.343989">
                <ColumnReference Table="@ReportingPeriodComposition" Alias="[rpc]" Column="ReportingPeriodId" />
                <ColumnReference Database="[Rating]" Schema="[risk]" Table="[Amount]" Alias="[a]" Column="Amount" />
                <ColumnReference Database="[Rating]" Schema="[risk]" Table="[Adjustment]" Alias="[adj]" Column="Amount" />
                <ColumnReference Column="Expr1006" />
                    <ColumnReference Column="Expr1006" />
                    <ScalarOperator ScalarString="CASE WHEN [Rating].[risk].[Amount].[LineItemTypeId] as [a].[LineItemTypeId] IS NOT NULL THEN [Rating].[risk].[Amount].[LineItemTypeId] as [a].[LineItemTypeId] ELSE [Rating].[risk].[Adjustment].[LineItemTypeId] as [adj].[LineItemTypeId] END">
                            <Compare CompareOp="IS NOT">
                                  <ColumnReference Database="[Rating]" Schema="[risk]" Table="[Amount]" Alias="[a]" Column="LineItemTypeId" />
                                <Const ConstValue="NULL" />
                              <ColumnReference Database="[Rating]" Schema="[risk]" Table="[Amount]" Alias="[a]" Column="LineItemTypeId" />
                              <ColumnReference Database="[Rating]" Schema="[risk]" Table="[Adjustment]" Alias="[adj]" Column="LineItemTypeId" />
                <RelOp AvgRowSize="33" EstimateCPU="9.21971E-05" EstimateIO="0" EstimateRebinds="0" EstimateRewinds="0" EstimateRows="104.769" LogicalOp="Filter" NodeId="1" Parallel="false" PhysicalOp="Filter" EstimatedTotalSubtreeCost="0.343979">
                    <ColumnReference Table="@ReportingPeriodComposition" Alias="[rpc]" Column="ReportingPeriodId" />
                    <ColumnReference Database="[Rating]" Schema="[risk]" Table="[Amount]" Alias="[a]" Column="LineItemTypeId" />
                    <ColumnReference Database="[Rating]" Schema="[risk]" Table="[Amount]" Alias="[a]" Column="Amount" />
                    <ColumnReference Database="[Rating]" Schema="[risk]" Table="[Adjustment]" Alias="[adj]" Column="LineItemTypeId" />
                    <ColumnReference Database="[Rating]" Schema="[risk]" Table="[Adjustment]" Alias="[adj]" Column="Amount" />
                    <RunTimeCountersPerThread Thread="0" ActualRows="137631" ActualEndOfScans="1" ActualExecutions="1" />
                  <Filter StartupExpression="false">
                    <RelOp AvgRowSize="33" EstimateCPU="0.000437936" EstimateIO="0" EstimateRebinds="0" EstimateRewinds="0" EstimateRows="104.769" LogicalOp="Left Outer Join" NodeId="2" Parallel="false" PhysicalOp="Nested Loops" EstimatedTotalSubtreeCost="0.343886">
                        <ColumnReference Table="@ReportingPeriodComposition" Alias="[rpc]" Column="ReportingPeriodId" />
                        <ColumnReference Database="[Rating]" Schema="[risk]" Table="[Amount]" Alias="[a]" Column="LineItemTypeId" />
                        <ColumnReference Database="[Rating]" Schema="[risk]" Table="[Amount]" Alias="[a]" Column="Amount" />
                        <ColumnReference Database="[Rating]" Schema="[risk]" Table="[Adjustment]" Alias="[adj]" Column="LineItemTypeId" />
                        <ColumnReference Database="[Rating]" Schema="[risk]" Table="[Adjustment]" Alias="[adj]" Column="Amount" />
                        <RunTimeCountersPerThread Thread="0" ActualRows="137647" ActualEndOfScans="1" ActualExecutions="1" />
                      <NestedLoops Optimized="false" WithUnorderedPrefetch="true">
                          <ColumnReference Table="@ReportingPeriodComposition" Alias="[rpc]" Column="ReportingPeriodId" />
                          <ColumnReference Database="[Rating]" Schema="[risk]" Table="[Amount]" Alias="[a]" Column="ReportingPeriodId" />
                          <ColumnReference Database="[Rating]" Schema="[risk]" Table="[Amount]" Alias="[a]" Column="LineItemTypeId" />
                          <ColumnReference Column="Expr1009" />
                        <RelOp AvgRowSize="26" EstimateCPU="0.000437936" EstimateIO="0" EstimateRebinds="0" EstimateRewinds="0" EstimateRows="104.769" LogicalOp="Left Outer Join" NodeId="4" Parallel="false" PhysicalOp="Nested Loops" EstimatedTotalSubtreeCost="0.00711828">
                            <ColumnReference Table="@ReportingPeriodComposition" Alias="[rpc]" Column="ReportingPeriodId" />
                            <ColumnReference Database="[Rating]" Schema="[risk]" Table="[Amount]" Alias="[a]" Column="ReportingPeriodId" />
                            <ColumnReference Database="[Rating]" Schema="[risk]" Table="[Amount]" Alias="[a]" Column="LineItemTypeId" />
                            <ColumnReference Database="[Rating]" Schema="[risk]" Table="[Amount]" Alias="[a]" Column="Amount" />
                            <RunTimeCountersPerThread Thread="0" ActualRows="137647" ActualEndOfScans="1" ActualExecutions="1" />
                          <NestedLoops Optimized="false">
                              <ColumnReference Table="@ReportingPeriodComposition" Alias="[rpc]" Column="ReportingPeriodId" />
                            <RelOp AvgRowSize="11" EstimateCPU="0.0001581" EstimateIO="0.003125" EstimateRebinds="0" EstimateRewinds="0" EstimateRows="1" LogicalOp="Clustered Index Seek" NodeId="5" Parallel="false" PhysicalOp="Clustered Index Seek" EstimatedTotalSubtreeCost="0.0032831" TableCardinality="0">
                                <ColumnReference Table="@ReportingPeriodComposition" Alias="[rpc]" Column="ReportingPeriodId" />
                                <RunTimeCountersPerThread Thread="0" ActualRows="1030" ActualEndOfScans="1" ActualExecutions="1" />
                              <IndexScan Ordered="true" ScanDirection="FORWARD" ForcedIndex="false" ForceSeek="false" NoExpandHint="false">
                                    <ColumnReference Table="@ReportingPeriodComposition" Alias="[rpc]" Column="ReportingPeriodId" />
                                <Object Table="[@ReportingPeriodComposition]" Index="[PK__#6FDF7DF__F9ABEE3F71C7C670]" Alias="[rpc]" />
                                      <Prefix ScanType="EQ">
                                          <ColumnReference Table="@ReportingPeriodComposition" Alias="[rpc]" Column="Src" />
                                          <ScalarOperator ScalarString="[@Src]">
                                              <ColumnReference Column="@Src" />
                            <RelOp AvgRowSize="22" EstimateCPU="0.000272246" EstimateIO="0.003125" EstimateRebinds="0" EstimateRewinds="0" EstimateRows="104.769" LogicalOp="Clustered Index Seek" NodeId="6" Parallel="false" PhysicalOp="Clustered Index Seek" EstimatedTotalSubtreeCost="0.00339725" TableCardinality="29974300">
                                <ColumnReference Database="[Rating]" Schema="[risk]" Table="[Amount]" Alias="[a]" Column="ReportingPeriodId" />
                                <ColumnReference Database="[Rating]" Schema="[risk]" Table="[Amount]" Alias="[a]" Column="LineItemTypeId" />
                                <ColumnReference Database="[Rating]" Schema="[risk]" Table="[Amount]" Alias="[a]" Column="Amount" />
                                <RunTimeCountersPerThread Thread="0" ActualRows="137631" ActualEndOfScans="1030" ActualExecutions="1030" />
                              <IndexScan Ordered="true" ScanDirection="FORWARD" ForcedIndex="false" ForceSeek="false" NoExpandHint="false">
                                    <ColumnReference Database="[Rating]" Schema="[risk]" Table="[Amount]" Alias="[a]" Column="ReportingPeriodId" />
                                    <ColumnReference Database="[Rating]" Schema="[risk]" Table="[Amount]" Alias="[a]" Column="LineItemTypeId" />
                                    <ColumnReference Database="[Rating]" Schema="[risk]" Table="[Amount]" Alias="[a]" Column="Amount" />
                                <Object Database="[Rating]" Schema="[risk]" Table="[Amount]" Index="[PK_Amount]" Alias="[a]" IndexKind="Clustered" />
                                      <Prefix ScanType="EQ">
                                          <ColumnReference Database="[Rating]" Schema="[risk]" Table="[Amount]" Alias="[a]" Column="ReportingPeriodId" />
                                          <ScalarOperator ScalarString="@ReportingPeriodComposition.[ReportingPeriodId] as [rpc].[ReportingPeriodId]">
                                              <ColumnReference Table="@ReportingPeriodComposition" Alias="[rpc]" Column="ReportingPeriodId" />
                        <RelOp AvgRowSize="18" EstimateCPU="0.000165111" EstimateIO="0.003125" EstimateRebinds="103.769" EstimateRewinds="0" EstimateRows="1" LogicalOp="Clustered Index Seek" NodeId="7" Parallel="false" PhysicalOp="Clustered Index Seek" EstimatedTotalSubtreeCost="0.33565" TableCardinality="178911">
                            <ColumnReference Database="[Rating]" Schema="[risk]" Table="[Adjustment]" Alias="[adj]" Column="LineItemTypeId" />
                            <ColumnReference Database="[Rating]" Schema="[risk]" Table="[Adjustment]" Alias="[adj]" Column="Amount" />
                            <RunTimeCountersPerThread Thread="0" ActualRows="1" ActualEndOfScans="137647" ActualExecutions="137647" />
                          <IndexScan Ordered="true" ScanDirection="FORWARD" ForcedIndex="false" ForceSeek="false" NoExpandHint="false">
                                <ColumnReference Database="[Rating]" Schema="[risk]" Table="[Adjustment]" Alias="[adj]" Column="LineItemTypeId" />
                                <ColumnReference Database="[Rating]" Schema="[risk]" Table="[Adjustment]" Alias="[adj]" Column="Amount" />
                            <Object Database="[Rating]" Schema="[risk]" Table="[Adjustment]" Index="[IX_Adjustment_ReportingPeriodId_LineItemTypeId]" Alias="[adj]" IndexKind="Clustered" />
                                  <Prefix ScanType="EQ">
                                      <ColumnReference Database="[Rating]" Schema="[risk]" Table="[Adjustment]" Alias="[adj]" Column="ReportingPeriodId" />
                                      <ScalarOperator ScalarString="@ReportingPeriodComposition.[ReportingPeriodId] as [rpc].[ReportingPeriodId]">
                                          <ColumnReference Table="@ReportingPeriodComposition" Alias="[rpc]" Column="ReportingPeriodId" />
                              <ScalarOperator ScalarString="[Rating].[risk].[Amount].[ReportingPeriodId] as [a].[ReportingPeriodId] IS NULL OR [Rating].[risk].[Amount].[LineItemTypeId] as [a].[LineItemTypeId]=[Rating].[risk].[Adjustment].[LineItemTypeId] as [adj].[LineItemTypeId]">
                                <Logical Operation="OR">
                                    <Compare CompareOp="IS">
                                          <ColumnReference Database="[Rating]" Schema="[risk]" Table="[Amount]" Alias="[a]" Column="ReportingPeriodId" />
                                        <Const ConstValue="NULL" />
                                    <Compare CompareOp="EQ">
                                          <ColumnReference Database="[Rating]" Schema="[risk]" Table="[Amount]" Alias="[a]" Column="LineItemTypeId" />
                                          <ColumnReference Database="[Rating]" Schema="[risk]" Table="[Adjustment]" Alias="[adj]" Column="LineItemTypeId" />
                      <ScalarOperator ScalarString="[Rating].[risk].[Amount].[LineItemTypeId] as [a].[LineItemTypeId] IS NOT NULL OR [Rating].[risk].[Adjustment].[LineItemTypeId] as [adj].[LineItemTypeId] IS NOT NULL">
                        <Logical Operation="OR">
                            <Compare CompareOp="IS NOT">
                                  <ColumnReference Database="[Rating]" Schema="[risk]" Table="[Amount]" Alias="[a]" Column="LineItemTypeId" />
                                <Const ConstValue="NULL" />
                            <Compare CompareOp="IS NOT">
                                  <ColumnReference Database="[Rating]" Schema="[risk]" Table="[Adjustment]" Alias="[adj]" Column="LineItemTypeId" />
                                <Const ConstValue="NULL" />
              <ColumnReference Column="@Src" ParameterRuntimeValue="(2)" />

Upvotes: 3

Views: 966

Answers (3)

Martin Smith
Martin Smith

Reputation: 452978

I would try replacing the @table_variable with a #temp table so that SQL Server has more accurate statistics to play with.

Currently it assumes the table variable will return 1 row and chooses a nested loops plan. You may get a different one if it can take account of the actual table cardinalities.

Upvotes: 0


Reputation: 27294

There is nothing particularily bad within the query plan that you have posted that I can see - I suspect SQL is making the right choices. The only thing I could spot which is slightly dodgy is that the query plan estimates and the actual number of rows returned are quite far apart - which indiates the stats are not entirely up to date - you could forcibly update the stats and see if it continues to use the same query plan.

If you are having issue with inconsistent performance, on a dev box clear the query plan cache and generate the query plan for a @SRC value that would produce very few rows, then clear the plan cache and generate the query plan for a @SRC value that would produce a very large amount of rows to be returned. If the query plans are the same you are ok, if they are different then you may need to use the OPTIMIZE FOR hint. This sometimes happens on parameterized queries where the first run of them determines the plan that sits in cache - and until that plan ages out, subsequent runs of the query use the same plan.

You would have to now provide more information about what specific problem you are encountering / looking to solve by having this reviewed?

Upvotes: 2


Reputation: 33143

What about using a JOIN HINT?

From MSDN:

LOOP | HASH | MERGE Specifies that the join in the query should use looping, hashing, or merging. Using LOOP |HASH | MERGE JOIN enforces a particular join between two tables. LOOP cannot be specified together with RIGHT or FULL as a join type.

REMOTE Specifies that the join operation is performed on the site of the right table. This is useful when the left table is a local table and the right table is a remote table. REMOTE should be used only when the left table has fewer rows than the right table.

If the right table is local, the join is performed locally. If both tables are remote but from different data sources, REMOTE causes the join to be performed on the site of the right table. If both tables are remote tables from the same data source, REMOTE is not required.

REMOTE cannot be used when one of the values being compared in the join predicate is cast to a different collation using the COLLATE clause.

REMOTE can be used only for INNER JOIN operations.

In your case you may be able to use a LOOP join since you are dealing with a LEFT join. Other then that your query looks fine, do you have indexes on the columns that you are filtering on?

Your amount table does have a lot of rows - but I've seen databases with much more. What is the hardware you are working with ?

For an example of how LOOP JOIN can be used and showed a particular optimization see this article. But it all depends on the type of query when using a join hint. It may not be applicable and should be a last resort option in your case.

Upvotes: 1

Related Questions