AndreaNobili
AndreaNobili

Reputation: 42957

SQL: GROUP BY vs. DISTINCT. What is the best one in this situation?

I am not so into database and I have the following doubt related a query that I am runngin on a MySql DB.

I have this query:

SELECT
    CD.id                AS id, 
    CD.commodity_name_en AS commodity_name_en

FROM MarketDetails AS MD
INNER JOIN MarketDetails_CommodityDetails AS MD_CD
      ON MD.id = MD_CD.market_details_id 
INNER JOIN CommodityDetails AS CD
      on MD_CD.commodity_details_id = CD.id
WHERE MD.localization_id = 1

that returns something like this (representing the list of commodities in some markets):

id                   commodity_name_en                                                                                                                                                                                                                                              
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1                    Rice-Asia                                                                                                                                                                                                                                                      
2                    Apple banana                                                                                                                                                                                                                                                   
3                    Avocado                                                                                                                                                                                                                                                        
4                    Red onion                                                                                                                                                                                                                                                      
5                    White onion                                                                                                                                                                                                                                                    
6                    Green Beans                                                                                                                                                                                                                                                    
7                    Bell pepper                                                                                                                                                                                                                                                    
8                    Carrot                                                                                                                                                                                                                                                         
9                    Tomatoes                                                                                                                                                                                                                    
10                   Irish potatoes                                                                                                                                                                                                                                                 
11                   Maize                                                                                                                                                                                                                                                          
1                    Rice-Asia                                                                                                                                                                                                                                                      
3                    Avocado                                                                                                                                                                                                                                                        
5                    White onion                                                                                                                                                                                                                                                    
8                    Carrot                                                                                                                                                                                                                                                         
11                   Maize                                                                                                                                                                                                                                                          
2                    Apple banana                                                                                                                                                                                                                                                   
7                    Bell pepper                                                                                                                                                                                                                                                    
9                    Tomatoes                                                                                                                                                                                                                    
10                   Irish potatoes                                                                                                                                                                                                                                                 
1                    Rice-Asia   

As you can see these commodities can appear multiple times (because a specific commodity can be sold in multiple market).

I want to change my query in such a way that every commodities appear only once (because in the end I want the list of all the possible commodities without duplication).

So I know that I can do in this way:

SELECT
    CD.id                AS id, 
    CD.commodity_name_en AS commodity_name_en

FROM MarketDetails AS MD
INNER JOIN MarketDetails_CommodityDetails AS MD_CD
      ON MD.id = MD_CD.market_details_id 
INNER JOIN CommodityDetails AS CD
      on MD_CD.commodity_details_id = CD.id
WHERE MD.localization_id = 1
GROUP BY id

I am grouping by ID but I can group by also by name (it is the same).

My doubt is: can I use the distinct statment to obtain the same behavior?

Reading here it seems that it could be a solution: https://www.tutorialspoint.com/sql/sql-distinct-keyword.htm

So I try also this solution and it seems return the same result

SELECT DISTINCT
    CD.id                AS id, 
    CD.commodity_name_en AS commodity_name_en

FROM MarketDetails AS MD
INNER JOIN MarketDetails_CommodityDetails AS MD_CD
      ON MD.id = MD_CD.market_details_id 
INNER JOIN CommodityDetails AS CD
      on MD_CD.commodity_details_id = CD.id
WHERE MD.localization_id = 1

So what is the exact difference between the DISTINCT and the GROUP BY solution? And what is the smartest one in a case like mine?

Thank you

Upvotes: 0

Views: 110

Answers (2)

Thorsten Kettner
Thorsten Kettner

Reputation: 94914

Don't join everything only to have to remove from that result set later.

You want to select commodities that exist in certain markets. So you'd select from the commodities table and look up their markets in the WHERE clause (with an EXISTS or IN clause).

select id, commodity_name_en
from commoditydetails
where id in
(
  select md_cd.commodity_details_id
  from marketdetails_commoditydetails md_cd
  join marketdetails md on md.id = md_cd.market_details_id
  where md.localization_id = 1
);

Or, if you like that better, even without any join:

select id, commodity_name_en
from commoditydetails
where id in
(
  select commodity_details_id
  from marketdetails_commoditydetails
  where market_details_id in (select id from marketdetails where md.localization_id = 1)
);

Upvotes: 0

Gordon Linoff
Gordon Linoff

Reputation: 1269753

Forget using either version. If you only want columns from one table, use exists instead:

SELECT CD.id, CD.commodity_name_en
FROM CommodityDetails CD
WHERE EXISTS (SELECT 1
              FROM MarketDetails MD INNER JOIN 
                   MarketDetails_CommodityDetails MD_CD
                   ON MD.id = MD_CD.market_details_id 
              WHERE MD_CD.commodity_details_id = CD.id AND
                    MD.localization_id = 1
             );

With this version, MySQL does not need to do aggregation on the entire result set -- and that can be a big cost savings. This should be able to take advantage of the indexes used for your original query.

Note: I removed the as for your column aliases. The default alias for CD.id is id. There is no reason to specify this explicitly (unless you really like typing and verbose queries).

As for your specific question, it was answered in the comments -- DISTINCT and GROUP BY should have very similar performance.

Upvotes: 5

Related Questions