MrsAdmin
MrsAdmin

Reputation: 548

SQL Query: Join causes triplicate result

Program SQL using mySQL (phpMyAdmin) & MySQL Workbench
Skill level None to speak of
Note The /PREFIX/ came with the SQL I contracted to be written 18 months ago.

Hi there,
I am trying to retrieve the Country, CountryID & State from my SQL database.
Unfortunately It is required that a second join be used as Country is in (orders T5); CountryID & State in (address_book T7).

From what I have read it could be too many joins? (I'm a little confused).

The Join causing the issue (I have tried: Inner, Left, Right joins):

left Join /*PREFIX*/address_book T7 On (T7.customers_id = T5.customers_id)

The full FROM clause:

From /*PREFIX*/products T1
    Left Join /*PREFIX*/orders_products T2 On (T1.products_id = T2.products_id)
    Inner Join /*PREFIX*/orders T5 On (T5.orders_id = T2.orders_id)
    Inner Join /*PREFIX*/products_description T3 On (T1.products_id = T3.products_id)
    Inner Join /*PREFIX*/customers T6 On (T5.customers_id = T6.customers_id)
    **left Join /*PREFIX*/address_book T7 On (T7.customers_id = T5.customers_id)**
    Left Join /*PREFIX*/paypal P1 On (P1.order_id = T5.orders_id)
    Left Join /*PREFIX*/manufacturers T4 On (T1.manufacturers_id = T4.manufacturers_id)
    Where (T5.date_purchased >= 20120101) 
    And (T5.date_purchased <= 20140227) 
    And T5.orders_status = 1

The query contains all store order information

Select
-- Other store Query --

-- Customer Information --
    T5.customers_id As CID,
    T6.customers_firstname As CFirst,
    T6.customers_lastname As CLast,
    T5.customers_name As CName,
    T5.customers_email_address As CEmail,
    T5.customers_country As CCountry,

    -- Following 2 lines from the 'FROM'
    T7.entry_country_id As CCountryID,
    T7.entry_state As CState,

I know it could do with some tidying, but I am not much more than a beginner at SQL, just enough to troubleshoot little bits of code, and paste it back together.

Using Copy/Create Statement

CREATE TABLE `address_book` (
  `address_book_id` int(11) NOT NULL AUTO_INCREMENT,
  `customers_id` int(11) NOT NULL DEFAULT '0',
  `entry_gender` char(1) NOT NULL DEFAULT '',
  `entry_company` varchar(64) DEFAULT NULL,
  `entry_firstname` varchar(32) NOT NULL DEFAULT '',
  `entry_lastname` varchar(32) NOT NULL DEFAULT '',
  `entry_street_address` varchar(64) NOT NULL DEFAULT '',
  `entry_suburb` varchar(32) DEFAULT NULL,
  `entry_postcode` varchar(10) NOT NULL DEFAULT '',
  `entry_city` varchar(32) NOT NULL DEFAULT '',
  `entry_state` varchar(32) DEFAULT NULL,
  `entry_country_id` int(11) NOT NULL DEFAULT '0',
  `entry_zone_id` int(11) NOT NULL DEFAULT '0',
  PRIMARY KEY (`address_book_id`),
  KEY `idx_address_book_customers_id_zen` (`customers_id`)
    ) ENGINE=MyISAM AUTO_INCREMENT=2917 DEFAULT CHARSET=utf8;

Upvotes: 0

Views: 556

Answers (2)

spencer7593
spencer7593

Reputation: 108480

The number of rows returned is not a matter of "too many" joins, per se, it's really a matter of the cardinality of the rows in the tables that are being joined.

For example, if a customer can have zero one or more delivery, then a query that does a join between customer and delivery is going to return "duplicate" customer, for any customer that has more than one delivery.

From what you've explained, it sounds as if a customer can (and does) have more than one associated row in address_book, and a query of just address_book on that particular customers_id will return three rows.

The behavior you observe is entirely expected.

There's basically two approaches to avoiding "duplicates" being returned. One is to use a the DISTINCT keyword or a GROUP BY clause to specify that rows be collapsed. The other general approach is to filter rows from the multi table so that the duplicates don't get generated, either using an inline view, or in some cases, a correlated subquery in the SELECT list.

For example, you could replace the JOIN to address book with a join to an inline view, to get a single address_book_id for each customers_id, and then join that to the address_book table to retrieve the other columns from the row with that id.

That is, replace this:

left Join /*PREFIX*/address_book T7 On (T7.customers_id = T5.customers_id)

with something like this:

LEFT 
JOIN ( SELECT T9.customers_id
            , MAX(T9.address_book_id) AS max_address_book_id
         FROM /*PREFIX*/address_book T9
        GROUP BY T9.customers_id
     ) T8
  ON T8.customers_id = T6.customers_id
LEFT
JOIN /*PREFIX*/address_book T7
  ON T7.customers_id = T8.customers_id
 AND T7.address_book_id = T8.max_address_book_id

Upvotes: 1

Ziouas
Ziouas

Reputation: 519

Propably customers can have multiple addresses and that causes result multiplication. You should try and read about how LEFT JOIN works

Upvotes: 2

Related Questions