J. Davidson
J. Davidson

Reputation: 3307

Left outer join on multiple tables

I have the following sql statement:

    select  
    a.desc
   ,sum(bdd.amount)
   from t_main c 
   left outer join t_direct bds on (bds.repid=c.id) 
   left outer join tm_defination def a on (a.id =bds.sId)
   where c.repId=1000000134
   group by a.desc;

When I run it I get the following result:

   desc       amount
   NW         12.00
   SW         10

When I try to add another left outer join to get another set of values:

   select  
    a.desc
   ,sum(bdd.amount)
   ,sum(i.amt)
   from t_main c 
   left outer join t_direct bds on (bds.repid=c.id) 
   left outer join tm_defination def a on (a.id =bdd.sId)
   left outer join t_ind i on (i.id=c.id)
   where c.repId=1000000134
   group by a.desc;

It basically doubles the amount field like:

         desc    amount   amt
         NW       24.00   234.00
         SE       20.00   234.00

While result should be:

        desc   amount   amt
        NW      12.00   234.00
        SE      10.00   NULL 

How do I fix this?

Upvotes: 4

Views: 90496

Answers (3)

Kevin Swann
Kevin Swann

Reputation: 1038

Left Outer Join - Driving Table Row Count

A left outer join may return more rows than there are in the driving table if there are multiple matches on the join clause.

Using MS SQL-Server:

DECLARE @t1 TABLE ( id INT )
INSERT INTO @t1 VALUES ( 1 ),( 2 ),( 3 ),( 4 ),( 5 );

DECLARE @t2 TABLE ( id INT )
INSERT INTO @t2 VALUES ( 2 ),( 2 ),( 3 ),( 10 ),( 11 ),( 12 );

SELECT * FROM @t1 t1
LEFT OUTER JOIN @t2 t2 ON t2.id = t1.id

This gives:

1   NULL
2   2
2   2
3   3
4   NULL
5   NULL

There are 5 rows in the driving table (t1), but 6 rows are returned because there are multiple matches for id 2.

So if an aggregate function is used, eg SUM() etc, grouped by the driving table column(s), this will give the wrong results.

To fix this, use derived tables or sub-queries to calculate the aggregate values, as already stated.

Left Outer Join - Multiple Tables

Where there are left outer joins over multiple tables, or any join for that matter, the query generates a series of derived tables in the order of joins.

SELECT * FROM t1
LEFT OUTER JOIN t2 ON t2.col2 = <...>
LEFT OUTER JOIN t3 ON t3.col3 = <...>

This is equivalent to:

SELECT * FROM
(
   SELECT * FROM t1
   LEFT OUTER JOIN t2 ON t2.col2 = <...>
) dt1
LEFT OUTER JOIN t3 ON t3.col3 = <...>

Here, for both queries, the results of the 1st left outer join are put into a derived table (dt1) which is then left outer joined to the 3rd table (t3).

For left outer joins over multiple tables, the order of the tables in the join clauses is critical.

Upvotes: 1

Sandr
Sandr

Reputation: 776

If you really need to receive the data as you mentioned, your can use sub-queries to perform the needed calculations. In this case you code may looks like the following:

select x.[desc], x.amount, y.amt
from
(
    select
         c.[desc]
       , sum (bdd.amount) as amount
       , c.id
    from t_main c 
    left outer join t_direct bds on (bds.repid=c.id) 
    left outer join tm_defination_def bdd on (bdd.id = bds.sId)
    where c.repId=1000000134
    group by c.id, c.[desc]
) x
left join
(
    select t.id, sum (t.amt) as amt 
    from t_ind t
    inner join t_main c
      on t.id = c.id
    where c.repID = 1000000134
    group by t.id
) y 
 on x.id = y.id

In the first sub-select you will receive the aggregated data for the two first columns: desc and amount, grouped as you need. The second select will return the needed amt value for each id of the first set. Left join between those results will gives the needed result. The addition of the t_main table to the second select was done because of performance issues.

Another solution can be the following:

select
     c.[desc]
   , sum (bdd.amount) as amount
   , amt = (select sum (amt) from t_ind where id = c.id)
from #t_main c 
left outer join t_direct bds on (bds.repid=c.id) 
left outer join tm_defination_def bdd on (bdd.id = bds.sId)
where c.repId = 1000000134
group by c.id, c.[desc]

The result will be the same. Basically, instead of using of nested selects the calculating of the amt sum is performing inline per each row of the result joins. In case of large tables the performance of the second solution will be worse that the first one.

Upvotes: 7

Mr Moose
Mr Moose

Reputation: 6344

Your new left outer join is forcing some rows to be returned in the result set a few times due to multiple relations most likely. Remove your SUM and just review the returned rows and work out exactly which ones you require (maybe restrict it to on certain type of t_ind record if that is applicable??), then adjust your query accordingly.

Upvotes: 6

Related Questions