ryanb
ryanb

Reputation: 16287

Get SUM in GROUP BY with JOIN using MySQL

I have two tables in MySQL 5.1.38.

products
+----+------------+-------+------------+
| id | name       | price | department |
+----+------------+-------+------------+
|  1 | Fire Truck | 15.00 | Toys       |
|  2 | Bike       | 75.00 | Toys       |
|  3 | T-Shirt    | 18.00 | Clothes    |
|  4 | Skirt      | 18.00 | Clothes    |
|  5 | Pants      | 22.00 | Clothes    |
+----+------------+-------+------------+

ratings
+------------+--------+
| product_id | rating |
+------------+--------+
|          1 |      5 |
|          2 |      5 |
|          2 |      3 |
|          2 |      5 |
|          3 |      5 |
|          4 |      5 |
|          5 |      4 |
+------------+--------+

My goal is to get the total price of all products which have a 5 star rating in each department. Something like this.

+------------+-------------+
| department | total_price |
+------------+-------------+
| Clothes    | 36.00       |  /* T-Shirt and Skirt */
| Toys       | 90.00       |  /* Fire Truck and Bike */
+------------+-------------+

I would like to do this without a subquery if I can. At first I tried a join with a sum().

select department, sum(price) from products
join ratings on product_id=products.id
where rating=5 group by department;
+------------+------------+
| department | sum(price) |
+------------+------------+
| Clothes    |      36.00 |
| Toys       |     165.00 |
+------------+------------+

As you can see the price for the Toys department is incorrect because there are two 5 star ratings for the Bike and therefore counting that price twice due to the join.

I then tried adding distinct to the sum.

select department, sum(distinct price) from products
join ratings on product_id=products.id where rating=5
group by department;
+------------+---------------------+
| department | sum(distinct price) |
+------------+---------------------+
| Clothes    |               18.00 |
| Toys       |               90.00 |
+------------+---------------------+

But then the clothes department is off because two products share the same price.

Currently my work-around involves taking something unique about the product (the id) and using that to make the price unique.

select department, sum(distinct price + id * 100000) - sum(id * 100000) as total_price
from products join ratings on product_id=products.id
where rating=5 group by department;
+------------+-------------+
| department | total_price |
+------------+-------------+
| Clothes    |       36.00 |
| Toys       |       90.00 |
+------------+-------------+

But this feels like such a silly hack. Is there a better way to do this without a subquery? Thanks!

Upvotes: 16

Views: 39357

Answers (4)

OMG Ponies
OMG Ponies

Reputation: 332791

Use:

  SELECT p.department,
         SUM(p.price) AS total_price
    FROM PRODUCTS p
    JOIN (SELECT DISTINCT 
                 r.product_id,
                 r.rating
            FROM RATINGS r) x ON x.product_id = p.id
                             AND x.rating = 5
GROUP BY p.department

Technically, this does not use a subquery - it uses a derived table/inline view.

Upvotes: 19

Thomas
Thomas

Reputation: 64674

The primary reason you are having trouble finding a solution is that the schema as presented is fundamentally flawed. You shouldn't allow a table to have two rows that are complete duplicates of each other. Every table should have a means to uniquely identify each row even if it is the combination of all columns. Now, if we change the ratings table so that it has an AUTO_INCREMENT column called Id, the problem is easier:

Select products.department, Sum(price) As total_price
From products
    Left Join ratings As R1
        On R1.product_id = products.id
            And R1.rating = 5
    Left Join ratings As R2
        On R2.product_id = R1.product_id
            And R2.rating = R1.rating
            And R2.Id > R1.Id
Where R2.Id Is Null
Group By products.department

Upvotes: 0

Tom H
Tom H

Reputation: 47402

I can't think of any way to do it without a subquery somewhere in the query. You could perhaps use a View to mask the use of a subquery.

Barring that, your best bet is probably to find the minimum data set needed to make the calculation and do that in the front end. Whether or not that's possible depends on your specific data - how many rows, etc.

The other option (actually, maybe this is the best one...) would be to get a new ORM or do without it altogether ;)

This view would allow you to bypass the subquery:

CREATE VIEW Distinct_Product_Ratings
AS
    SELECT DISTINCT
        product_id,
        rating
    FROM
        Ratings

Upvotes: -1

Erick Robertson
Erick Robertson

Reputation: 33082

You can do two queries. First query:

SELECT DISTINCT product_id FROM ratings WHERE rating = 5;

Then, take each of those ID's and manually put them in the second query:

SELECT   department, Sum(price) AS total_price
FROM     products
WHERE    product_id In (1,2,3,4)
GROUP BY department;

This is the work-around for not being able to use subqueries. Without them, there is no way to eliminate the duplicate records caused by the join.

Upvotes: -1

Related Questions