Susan Din
Susan Din

Reputation: 11

How to write a SQL query with no duplicates

How to write a SQL query with no duplicates correctly ?

There are two tables. Table 1 CUSTOMERS and table 2 ORDERS

CUSTOMERS (ID, FIRSTNAME, LASTNAME, ADDRESS); 
ORDERS (ID, PRODUCT_NAME, PRODUCT_PRICE, DATE_ORDER, ID_CUSTOMER, AMOUNT);

ID in CUSTOMERS table is a primary key, and ORDERS table has ID_CUSTOMER a foreign key How should I write a query to show the ID of a customers without duplicates, who have ordered 'Apple MacBook Air 13' ?

 CUSTOMERS (ID, FIRSTNAME, LASTNAME, ADDRESS); 
 ORDERS (ID, PRODUCT_NAME, PRODUCT_PRICE, DATE_ORDER, ID_CUSTOMER, AMOUNT); 

SELECT DISTINCT CUSTOMERS.ID, ORDERS.PRODUCT_NAME 
FROM CUSTOMERS 
INNER JOIN ORDERS 
ON CUSTOMERS.ID = ORDERS.ID_CUSTOMERS
WHERE PRODUCT_NAME = ‘Apple MacBook Air 13’; 

I have written a query but it seems unclear whether to use DISTINCT operator with INNER JOIN. Is it possible to have DISTINCT and joins ?

Upvotes: 0

Views: 3058

Answers (4)

Thorsten Kettner
Thorsten Kettner

Reputation: 94914

DISTINCT is a meant to remove duplicates. This is useful sometimes. But it is also often an indicator for a badly written query, because: why are there duplicate rows in the result in the first place?

As to your query: Why are you joining the customer table? It doesn't provide anything we don't get from the orders table already. Your query can be shortened to:

SELECT DISTINCT id_customers, 'Apple MacBook Air 13'
FROM orders 
WHERE product_name = 'Apple MacBook Air 13';

Or, as it's only about one product anyway:

SELECT DISTINCT id_customers
FROM orders 
WHERE product_name = 'Apple MacBook Air 13';

We are using DISTINCT here, because a customer can order a product multiple times and would hence show up in the results multiple times, didn't we use DISTINCT. However, we read all 'Apple MacBook Air 13' rows, only to remove some or even many of them in the end.

We can write the query differently to avoid that. We start with the customers table and show those IDs / rows where we find at least one order on 'Apple MacBook Air 13'.

SELECT id
FROM customers
WHERE id IN
(
  SELECT id_customers
  FROM orders
  WHERE product_name = 'Apple MacBook Air 13'
);

or

SELECT id
FROM customers c
WHERE EXISTS
(
  SELECT *
  FROM orders o
  WHERE o.id_customers = c.id
  AND o.product_name = 'Apple MacBook Air 13'
);

In these queries we only select from the customers table (the ID here, but we could just as well show the customer's name etc.) and use the orders table for lookup. Once the DBMS finds an 'Apple MacBook Air 13' order for acustomer, it doesn't have to look for more such orders for the same customer. That can save quite some time. And we only get each customer once and don't have to apply DISTINCT (for which all result rows would have to be sorted and compared). The queries are also very readable (albeit a tad longer), because we clearly separate what table we are selecting rows from (FROM clause) and what the search conditions are (WHERE clause).

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1269773

In this query:

SELECT c.ID, o.PRODUCT_NAME 
FROM CUSTOMERS c INNER JOIN
     ORDERS o
     ON c.ID = o.ID_CUSTOMERS
WHERE o.PRODUCT_NAME = 'Apple MacBook Air 13'; 

You can get duplicates for customers who have ordered the product more than once. With SELECT DISTINCT you would only get one row.

Next, the JOIN is not necessary. You can simply do:

SELECT DISTINCT o.ID_CUSTOMERS, o.PRODUCT_NAME 
FROM ORDERS o
WHERE o.PRODUCT_NAME = 'Apple MacBook Air 13'; 

Upvotes: 1

You can imagine DISTINCT as a form of an output filter which is applied when the query itself has finished. The JOIN is used to build the input to the query. So short answers is: yes, you can do it.

Upvotes: 3

Arthur Almeida
Arthur Almeida

Reputation: 568

Do you know why is duplicating, right? It's because for each customers you could have 1 or more customers_orders.

So if you want to still keep filtering by the product and remove the duplicates you can use the DISTINCT clause in this case and at the SELECT statement just reference fields from the CUSTOMERS table. If you have any field from the CUSTOMERS_ORDERS table being referenced on the SELECT it will duplicate.

Upvotes: 0

Related Questions