CharlieE
CharlieE

Reputation: 33

Average Genre per Movie in Genre SQL Server 2012

Stumbled across this problem that I could not find the solution too, need some help or point me in the right direction.

The tables categorize several movies into multiple genres, for example: Toy Story is in the genres; Children's, Comedy and Animation.

I am trying to find out which genres are most clear-cut and which are so muddled that they tend to be listed among several others for a typical movie.

For each genre in the data set, how many genres are the movies in that genre in, on average. For example, a Action movie is in 1.3 genres and the average Adventure movie is in 2.9 genres, so an Action movie is a better defined genre.

Below are samples of the tables but here is a sql editor with the tables and values in them for reference: https://blazerme.herokuapp.com/queries/new

3 tables:

genre table:

id  name
1   Action
2   Adventure
3   Animation
4   Children's
5   Comedy

genre_movies table:

id movie_id     genre_id
1  1(Toy Story) 3(Animation)
2  1(Toy Story) 4(Children's)
3  1(Toy Story) 5(Comedy)
4  2(GoldenEye) 1(Action)
5  2(GoldenEye) 2(Adventure)
6  2(GoldenEye) 16(Thriller)

movies table:

id title             release_date
1  Toy Story (1995)  1995-01-01
2  GoldenEye (1995)  1995-01-01
3  Four Rooms (1995) 1995-01-01

My best effort so far is I try to Avg over the count of the genre_id. See below:

SELECT 
    name, AVG(c.count)
FROM 
    (SELECT
         g.name AS name, COUNT(gm.genre_id) AS count
     FROM
         genres g 
     INNER JOIN
         genres_movies gm ON g.id = gm.genre_id
     INNER JOIN
         movies m ON m.id = gm.movie_id
     GROUP BY
         g.name) c
GROUP BY
    name

But that simply gives me a count of each genre from the genres_movies table. I can't seem to figure out how to turn this into the average number of genres for movies in each genre.

Upvotes: 3

Views: 2897

Answers (2)

Jeff Winchell
Jeff Winchell

Reputation: 113

You need to do a cartesian product (self-join) on the table so that you show for each movie, every combination of two genres. Then the aggregation/stat is more obvious.

OK. Here's the code.

Select Genres.Name As Genre,
        Count(*)*1.0/Count(Distinct A.Movie_Id) As Average_Genres
    From Genres_Movies A
            Inner Join
        Genres_Movies B
                On A.Movie_Id=B.Movie_Id
            Inner Join
        Genres
                On A.Genre_Id=Genres.Id
    Group By Genres.Name
    Order By 2 Desc

Upvotes: -1

seagulledge
seagulledge

Reputation: 344

I solve problems like this by first building the innermost query with the main info that I want, and then work outwards.

The core information needed is:

select movie_id, count(*) from genres_movies group by movie_id

The complete query:

select G.name, Round(avg(numgenres),2) as MuddleFactor
from genres_movies GM

inner join
(select movie_id, count(*) as numGenres
 from genres_movies
 group by movie_id) MNG 
 on MNG.movie_id = GM.movie_id

inner join genres G
on GM.genre_id = G.id

group by G.name
order by MuddleFactor desc

Upvotes: 3

Related Questions