Reputation: 723
The problem below that I would like to solve sounds quite simple but it is quite difficult for me to arrive at the solution with SQL.
Let's say I have a simplified ERD that looks as follows:
This is a database of a hypothetical movie streaming website that keeps the data about its customers, invoice, movies and their genres. Each customer has a unique CustomerID, each purchase has a unique InvoiceID, each movie has a unique MovieID, each genre has a unique GenreID. One MovieID is only associated with one GenreID. One customer can purchase many movies, so one Invoice can contains many MovieIDs.
I would like to show the top 2 most popular genres (in terms of total count of InvoiceID) in each of the following continents/regions: Africa (Egypt, South Africa), Asia (China, Japan), Europe (France, Germany).
An example of the Customer and Genre tables are as follows:
The resulting table that I would like to have looks more or less like this (Region means Continent):
How exactly the resulting table look like is not important, I would just want the top 2 genres grouped by each region.
Here are some of my effort in producing the query:
WITH t1 AS (
SELECT c.Country,
CASE
WHEN (
(c.Country = "China" OR c.Country = "Japan") THEN "Asia"
)
WHEN (
(c.Country = "Egypt" OR c.Country = "South Africa") THEN "Africa"
)
ELSE "Europe"
END AS region
FROM Customer c
GROUP BY 1
)
SELECT DISTINCT region
FROM t1;
I could only produce the new region groups but got stuck at how to rank the genres grouped by the regions. I have a difficulty in showing the ranking of genres (based on total count of InvoiceID) in each region. In other words, in each of the regions, how many customers purchased each genre. I have just started learning SQL that includes different kinds of JOINS, SUBQUERY, AGGREGATIONS, GROUP BY, ORDER BY and Common Table Expression but somehow I cannot produce the result I want. Could somebody please give some light on this? Thank you in advance for the help.
Upvotes: 0
Views: 66
Reputation: 23676
WITH cte AS (
SELECT
CASE WHEN country IN ('FR','GER') THEN 'europe'
WHEN country IN ('CHI', 'JAP') THEN 'asia'
END AS region,
genre_name
FROM
joined_table
)
SELECT
region, genre_name
FROM (
SELECT
region,
genre_name,
row_number() OVER (PARTITION BY region ORDER BY COUNT(*) DESC)
FROM cte
GROUP BY region, genre_name
) s
WHERE row_number <= 2
region
and genre_name
and count the genres.COUNT
can be used to generate a row_number()
for each region, ordered by the descending count order to get the highest counts to the top. So the most interesting genres become the first records for each groupUpvotes: 2