Annalix
Annalix

Reputation: 480

Correct use of UNION in SQL

I need to compute the difference between two different rating averages, one for rating of movies before a specific year_cutoff (1980) and the second one after year_cutoff, querying from two different databases, Rating and Movies.

What I did is:

I need to compute the difference of the average stars between the movies pre-1980 and post-1980.

At first I GROUPED BY movie titles and compute the rating average for each group.

Secondly, I divided these groups into two categories, pre-1980 and post-1980

Finally, I am trying to recompute the average (of the average) for each of these two groups, namely avgBefore and avgAfter and compute the difference of this two new averages

I expect one number that is the avgBefore - avgAfter (avgB - avgA)

Below is my attempt for the code. My main problem is to insert correctly the two conditions 'before' and 'after' 1980. I am trying to define aliases like avgB and avgA, but apparently the UNION clause is not properly called.

SELECT AVG(avgB) - AVG(avgA)
FROM(
SELECT AVG(stars) as avgB
FROM Rating
JOIN Movie
ON Rating.mID = Movie.mID 
GROUP BY title
HAVING year < 1980
UNION
SELECT AVG(stars) as avgA
FROM Rating
JOIN Movie
ON Rating.mID = Movie.mID 
GROUP BY title
HAVING year > 1980
);

Upvotes: 0

Views: 116

Answers (6)

FDavidov
FDavidov

Reputation: 3675

Try this:

SELECT AVG(avgB) - AVG(avgA)
  FROM(
       SELECT AVG(stars) as avgB
         FROM Rating
         JOIN Movie
         ON Rating.mID = Movie.mID 
        WHERE Movie.year < 1980
       GROUP BY title
      UNION
       SELECT AVG(stars) as avgA
         FROM Rating
         JOIN Movie
         ON Rating.mID = Movie.mID 
        WHERE Movie.year >= 1980
       GROUP BY title
      );

NOTE: I don't know where you get the year value from, so you will need to specify it.

UPDATE: fixed reference to year.

UPDATE 2: Corrected query.

Here is what I came up. Please note that I don't have access right now to a DB to verify syntax, but I guess it should be mostly OK:

SELECT (AVG(B.Rating_OLD) - AVG(B.Rating_NEW)) AS Rating_Diff
  FROM (
        SELECT A.title , AVG(A.stars_OLD) AS Rating_OLD ,AVG(A.stars_NEW) AS Rating_NEW
          FROM (
                SELECT title                                    , 
                       CASE 
                          WHEN Movie.year <  1980 THEN Rating.stars
                          ELSE                         0
                       END AS stars_OLD                         ,
                       CASE 
                          WHEN Movie.year >= 1980 THEN Rating.stars
                          ELSE                         0
                       END AS stars_NEW                         
                  FROM Rating
                  JOIN Movie
                   ON Rating.mID = Movie.mID
               ) A
         GROUP BY A.title 
       ) B

Upvotes: 0

Thorsten Kettner
Thorsten Kettner

Reputation: 95053

You want to get the average rating per movie first. From this result you want to get the average movie rating both pre-1980 and post-1980. Two steps hence: an aggregation per movie, then a total aggregation.

select
  avg(r.stars) as total,
  avg(case when m.year < 1980 then r.stars end) as pre1980,
  avg(case when m.year > 1980 then r.stars end) as post1980,
  avg(case when m.year < 1980 then r.stars end) -
  avg(case when m.year > 1980 then r.stars end) as diff
from
(
  select mid, avg(stars) as stars
  from rating
  group by mid
) r
join movie m on m.mid = r.mid;

(As already mentioned, you may want to include 1980 movies in either the pre or the post range and not omit them completly.)

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1270371

The most efficient way to do this uses conditional aggregation:

SELECT (AVG(CASE WHEN m.year < 1980 THEN r.stars END) -
        AVG(CASE WHEN m.year >= 1980 THEN r.stars END)
       ) AS averag
FROM Rating r JOIN
     Movie m
     ON r.mID = m.mID;

Trying to use two separate queries just makes the query more complicated than it needs to be.

Upvotes: 0

Rahul Jain
Rahul Jain

Reputation: 1399

Use join instead on UNION:

SELECT 
    AVG(avgB) - AVG(avgA)
FROM
(
    (
        SELECT
            AVG(stars) as avgB
        FROM 
            Rating
        JOIN 
            Movie
        ON 
            Rating.mID = Movie.mID 
        WHERE 
            year < 1980
    ) t1
    CROSS JOIN
    (
        SELECT
            AVG(stars) as avgA
        FROM 
            Rating
        JOIN 
            Movie
        ON 
            Rating.mID = Movie.mID 
        WHERE 
            year < 1980
    ) t2
);

Upvotes: 0

Shushil Bohara
Shushil Bohara

Reputation: 5656

What if we follow this approach:

SELECT 
    (AVG(CASE WHEN year < 1980 THEN stars ELSE 0 END)-
    AVG(CASE WHEN year >= 1980 THEN stars ELSE 0 END)) AS averag
FROM Rating
JOIN Movie ON Rating.mID = Movie.mID

Upvotes: 0

Katusic
Katusic

Reputation: 84

You are doing it wrong way. You can't have two different aliases in UNION. Your script knows only about alias avgB. Instead of UNION i suggest that you use JOIN.

Upvotes: 0

Related Questions