KacieHouser
KacieHouser

Reputation: 2125

left join not doing as expected with sum and group by

This is all going to have to be pseudo as I am on my phone and have no internet access right now as I have just moved but its bugging the crap out of me. This also means I can't do code blocks please bear with me: I'll try.

I have a table with amounts in it, and I have a table with labels. I want to sum the amounts in the first table grouped by the labels. The problem is, if there are no records for a label existing in the table with the amounts then I don't get a record in the result set for that label. I need a record there with nulls for the amount tables field. Here is what some sample data might look like:

Amount_table:

Columns: id, tpa, amt, link_to_label_table

Data:

1, GTL, 2000, 1
2, GTL, 1000, 1

Label_table:

Columns: link_to_amount_table, label_name

Data:

1, Label1
2, Label2

Query:

Select at.tpa, sum(at.amt) as amt, lt.label_name
From Amount_table as at
Left join Label_tabl lt on lt.link_to_amount_table = at.link_to_label_table
Where at.tpa = 'GTL' 
Group by lt.label, at.tpa

Now this returns:

GTL, 3000, Label1

I tried selecting from the labels table then left joining the amount table and it still didn't give my desired results which are:

GTL, 3000, Label1
Null, Null, Label2

Is this possible with the sum and group by? The fields being grouped by have to be there otherwise you get an error. This is in DB2 by the way. Is there any way possible to get this to return the way I need it? I have to get the labels; they are dynamic.

Upvotes: 0

Views: 2314

Answers (2)

Jonathan Leffler
Jonathan Leffler

Reputation: 754590

On the face of it, you want to have your labels table as the dominant table and the amounts table as the one that is outer joined.

SELECT a.tpa, sum(a.amt) as amt, l.label_name
  FROM Label_table AS l
  LEFT JOIN Amount_table AS a
    ON l.link_to_amount_table = a.link_to_label_table
 GROUP BY l.label, a.tpa

You have a condition Amount_table.tpa = 'GTL'; it is not entirely clear why you have that, but presumably it is significant with more data in the tables. There are (at least) two ways you can incorporate that condition into the query (other than the one you chose - which eliminates the rows where a.tpa is null).

SELECT a.tpa, sum(a.amt) as amt, l.label_name
  FROM Label_table AS l
  LEFT JOIN Amount_table AS a
    ON l.link_to_amount_table = a.link_to_label_table
   AND a.tpa = 'GTL'
 GROUP BY l.label, a.tpa

Or:

SELECT a.tpa, sum(a.amt) as amt, l.label_name
  FROM Label_table AS l
  LEFT JOIN (SELECT *
               FROM Amount_table
              WHERE tpa = 'GTL') AS a
    ON l.link_to_amount_table = a.link_to_label_table
 GROUP BY l.label, a.tpa

A decent optimizer will produce the same query plan for both, so it probably doesn't matter which you use. There's an argument that suggests the second alternative is cleaner in that the ON clause is primarily for joining conditions, and the filter condition on a.tpa is not a joining condition. There's another argument that says the first alternative avoids a sub-query and is therefore preferable. I'd validate that the query plans are the same and would probably choose the second, but it is a somewhat nebulous decision based on a mild preference.

Upvotes: 2

John Pick
John Pick

Reputation: 5650

You were so close on your second try. Change WHERE to AND. This has the effect of applying at.tpa='GTL' to the JOIN instead of applying it to the filter so you don't filter out the NULLs.

Upvotes: 0

Related Questions