Reputation: 33
Stumbled across this problem that I could not find the solution too, need some help or point me in the right direction.
The tables categorize several movies into multiple genres, for example: Toy Story is in the genres; Children's, Comedy and Animation.
I am trying to find out which genres are most clear-cut and which are so muddled that they tend to be listed among several others for a typical movie.
For each genre in the data set, how many genres are the movies in that genre in, on average. For example, a Action movie is in 1.3 genres and the average Adventure movie is in 2.9 genres, so an Action movie is a better defined genre.
Below are samples of the tables but here is a sql editor with the tables and values in them for reference: https://blazerme.herokuapp.com/queries/new
3 tables:
genre table:
id name
1 Action
2 Adventure
3 Animation
4 Children's
5 Comedy
genre_movies table:
id movie_id genre_id
1 1(Toy Story) 3(Animation)
2 1(Toy Story) 4(Children's)
3 1(Toy Story) 5(Comedy)
4 2(GoldenEye) 1(Action)
5 2(GoldenEye) 2(Adventure)
6 2(GoldenEye) 16(Thriller)
movies table:
id title release_date
1 Toy Story (1995) 1995-01-01
2 GoldenEye (1995) 1995-01-01
3 Four Rooms (1995) 1995-01-01
My best effort so far is I try to Avg over the count of the genre_id. See below:
SELECT
name, AVG(c.count)
FROM
(SELECT
g.name AS name, COUNT(gm.genre_id) AS count
FROM
genres g
INNER JOIN
genres_movies gm ON g.id = gm.genre_id
INNER JOIN
movies m ON m.id = gm.movie_id
GROUP BY
g.name) c
GROUP BY
name
But that simply gives me a count of each genre from the genres_movies
table. I can't seem to figure out how to turn this into the average number of genres for movies in each genre.
Upvotes: 3
Views: 2897
Reputation: 113
You need to do a cartesian product (self-join) on the table so that you show for each movie, every combination of two genres. Then the aggregation/stat is more obvious.
OK. Here's the code.
Select Genres.Name As Genre,
Count(*)*1.0/Count(Distinct A.Movie_Id) As Average_Genres
From Genres_Movies A
Inner Join
Genres_Movies B
On A.Movie_Id=B.Movie_Id
Inner Join
Genres
On A.Genre_Id=Genres.Id
Group By Genres.Name
Order By 2 Desc
Upvotes: -1
Reputation: 344
I solve problems like this by first building the innermost query with the main info that I want, and then work outwards.
The core information needed is:
select movie_id, count(*) from genres_movies group by movie_id
The complete query:
select G.name, Round(avg(numgenres),2) as MuddleFactor
from genres_movies GM
inner join
(select movie_id, count(*) as numGenres
from genres_movies
group by movie_id) MNG
on MNG.movie_id = GM.movie_id
inner join genres G
on GM.genre_id = G.id
group by G.name
order by MuddleFactor desc
Upvotes: 3