John
John

Reputation: 135

mysql INNER JOIN not working like I thought

Have two tables:

CREATE  TABLE IF NOT EXISTS `issue_details` (
    `id` INT NOT NULL AUTO_INCREMENT ,
    amt_offer_dlr INT NOT NULL DEFAULT 9999 COMMENT '564,510,000',
   maturity  DATE NOT NULL DEFAULT '1111-11-11' COMMENT '06/28/2012' ,
    fk_cusip6 VARCHAR(6) NOT NULL DEFAULT '' COMMENT '' ,
  PRIMARY KEY (id),
  CONSTRAINT con_issue_details__issuers FOREIGN KEY (fk_cusip6) REFERENCES issuers    (fk_cusip6) ON DELETE NO ACTION ON UPDATE CASCADE
)
ENGINE = InnoDB;

CREATE  TABLE IF NOT EXISTS `identifiers` (
  id     INT NOT NULL AUTO_INCREMENT ,
  fk_cusip6 VARCHAR(6)  NOT NULL DEFAULT 'NA' ,
  PRIMARY KEY (id) ,
  CONSTRAINT con_ident__cusip6  FOREIGN KEY (fk_cusip6) REFERENCES cusip_6 (cusip6) ON DELETE NO ACTION ON UPDATE CASCADE ,
)
COMMENT ''
 ENGINE = InnoDB;

issue_details table: note: each id is different can have multiple identical fk_cusip6 numbers

+-----+--------------+------------+----------------+
| id  | maturity     | fk_cusip6  |  amt_offer_dlr |
+-----+--------------+------------+----------------+
|  1  | 2013-06-28   | 567090     |  1000          |
|  2  | 2014-06-05   | 567090     |  1000          |
|  3  | 2013-06-05   | 567100     |  2500          |
|  3  | 2014-06-05   | 567100     |  2500          |
+-----+--------------+------------+----------------+

identifier table: note: each fk_cusip6 number is different

+--------+-----------+
| id     | fk_cusip6 |
+--------+-----------+
| 131472 | 567090    |
| 131473 | 567100    |
+--------+-----------+

I need to generate a query that will match each instance of fk_cusip6 in the identifier table, and then sum amt_offer_dlr for all instances that have the same fk_cusip6 in the issue_details table. The end result should be:

+--------------+---------+------------------------+
| ide.id | isd.fk_cusip6 |  SUM(isd.Amt_Offer_Dlr |
+--------------+---------+------------------------+
| 131472 | 567090        | 2000                   |
| 131473 | 567100        | 5000                   |
+--------------+---------+------------------------+

I tried:

SELECT CURRENT_DATE, FORMAT(SUM(isd.Amt_Offer_Dlr),0) 'Current Out Standing Debt:'
from muni.issue_details isd
INNER JOIN identifiers ide ON ide.fk_cusip6 = isd.fk_cusip6 AND  isd.fk_cusip6 = '567541'
where isd.maturity > CURRENT_DATE
;

This is summing all of the columns.

I think I need a sub-query, but am stuck on the details.

Thanks

Upvotes: 0

Views: 271

Answers (2)

John
John

Reputation: 136

Think of the query process in your SQL as a three-phase activity. This might not capture all the richness of what really happens, but it is logically accurate in your case.

Let's simplify the query to remove the date/maturity check and check for a single fk_cusip6 value, so it is just:

SELECT SUM(isd.Amt_Offer_Dlr) 
FROM muni.issue_details isd
INNER JOIN identifiers ide ON ide.fk_cusip6 = isd.fk_cusip6

Phase one is the joins of the tables to generate a list of all the rows to be considered. So in the simplified example above, we are looking at a join between issue_details and identifiers matching on fk_cusip6. This phase generates a list of four rows (one for each of the issue_details rows, with the corresponding identifiers row tacked on).

That gives you a structure like this:

+-----+--------------+------------+----------------+---------+----------------+
| id  | maturity     | fk_cusip6  |  amt_offer_dlr | ide.id  | ide.fk_cusip6  |
+-----+--------------+------------+----------------+---------+----------------+
|  1  | 2013-06-28   | 567090     |  1000          |  131472 |  567090        |
|  2  | 2014-06-05   | 567090     |  1000          |  131472 |  567090        |
|  3  | 2013-06-05   | 567100     |  2500          |  131473 |  567100        |
|  3  | 2014-06-05   | 567100     |  2500          |  131473 |  567100        |
+-----+--------------+------------+----------------+---------+----------------+

Now, a non-aggregating SQL statement like this:

SELECT *
FROM muni.issue_details isd
INNER JOIN identifiers ide ON ide.fk_cusip6 = isd.fk_cusip6

would return just that table. But you have an aggregation with the SUM(), so we go into phase two.

Phase two determines which rows should get bucketed together to perform the aggregation. I say bucketed, but in SQL terminology we would say which rows get grouped together, and that means potentially using a GROUP BY clause. In our query, though, there is no GROUP BY clause, so the query processor must choose to group all the rows into a single bucket. This isn't what you want, but without any instructions on how to group rows, this is what the processor must do. So your aggregation will perform a single aggregation over all four rows.

Phase three is the actual aggregation itself. This is where we sum up the results of all four rows. In the sample query above, we would just see an output of 1000+1000+2500+2500, or 7000. Again, not what you wanted.

Now consider this alternative query:

SELECT ide.id, isd.fk_cusip6, SUM(isd.Amt_Offer_Dlr) 
FROM muni.issue_details isd
INNER JOIN identifiers ide ON ide.fk_cusip6 = isd.fk_cusip6
GROUP BY ide.id, isd.fk_cusip6

Phase one of this query is the same as before; four rows are generated as a result of the inner join.

In phase two, though, now we have a GROUP BY, so the query processor will bucket the rows from phase one into separate buckets for each value of identifiers.id. So we get something like this:

+-------+-----+--------------+------------+----------------+---------+-----------------+ |bucket | id | maturity | fk_cusip6 | amt_offer_dlr | ide.id | iden.fk_cusip6 | |-------+-----+--------------+------------+----------------+---------+-----------------+ | A | 1 | 2013-06-28 | 567090 | 1000 | 131472 | 567090 | | A | 2 | 2014-06-05 | 567090 | 1000 | 131472 | 567090 | |-------+-----+--------------+------------+----------------+---------+-----------------+ | B | 3 | 2013-06-05 | 567100 | 2500 | 131473 | 567100 | | B | 3 | 2014-06-05 | 567100 | 2500 | 131473 | 567100 | +-------+-----+--------------+------------+----------------+---------+-----------------+

(sorry I couldn't figure out how to format a table this wide)

Bucket A is all rows with identifiers.id=131472, and bucket B with identifiers.id=131473.

Phase three performs the aggregation for each bucket, giving you the result you desired:

+--------------+---------+------------------------+
| ide.id | isd.fk_cusip6 |  SUM(isd.Amt_Offer_Dlr |
+--------------+---------+------------------------+
| 131472 | 567090        | 2000                   |
| 131473 | 567100        | 5000                   |
+--------------+---------+------------------------+

I don't think you'll need a subquery for this as long as the other search criteria you want to apply are straightforward. But even then, if the criteria get more complex, you'd have to figure out whether they go into the sub-query, the join clause, or the overall WHERE clause itself, which could lead to confusion. I try to keep the selects as simple as possible.

Upvotes: 2

cdhowie
cdhowie

Reputation: 168988

Try this:

SELECT
    i.id,
    i.fk_cusip6,
    x.amt_offer_dlr_sum

FROM identifiers i

INNER JOIN (
    SELECT
        id.fk_cusip6,
        SUM(id.amt_offer_dlr) AS amt_offer_dlr_sum
    FROM issue_details id
    GROUP BY id.fk_cusip6
) x
ON x.fk_cusip6 = i.fk_cusip6;

Upvotes: 2

Related Questions