How to rank an aggregation in each newly formed group?

Question

The problem below that I would like to solve sounds quite simple but it is quite difficult for me to arrive at the solution with SQL.

Let's say I have a simplified ERD that looks as follows:

This is a database of a hypothetical movie streaming website that keeps the data about its customers, invoice, movies and their genres. Each customer has a unique CustomerID, each purchase has a unique InvoiceID, each movie has a unique MovieID, each genre has a unique GenreID. One MovieID is only associated with one GenreID. One customer can purchase many movies, so one Invoice can contains many MovieIDs.

I would like to show the top 2 most popular genres (in terms of total count of InvoiceID) in each of the following continents/regions: Africa (Egypt, South Africa), Asia (China, Japan), Europe (France, Germany).

An example of the Customer and Genre tables are as follows:

The resulting table that I would like to have looks more or less like this (Region means Continent):

How exactly the resulting table look like is not important, I would just want the top 2 genres grouped by each region.

Here are some of my effort in producing the query:

WITH t1 AS (
      SELECT c.Country,
      CASE
        WHEN (
          (c.Country = "China" OR c.Country = "Japan") THEN "Asia"
        )
        WHEN (
          (c.Country = "Egypt" OR c.Country = "South Africa") THEN "Africa"
        )
        ELSE "Europe"
      END AS region
    FROM Customer c
    GROUP BY 1
    )

SELECT DISTINCT region
FROM t1;

I could only produce the new region groups but got stuck at how to rank the genres grouped by the regions. I have a difficulty in showing the ranking of genres (based on total count of InvoiceID) in each region. In other words, in each of the regions, how many customers purchased each genre. I have just started learning SQL that includes different kinds of JOINS, SUBQUERY, AGGREGATIONS, GROUP BY, ORDER BY and Common Table Expression but somehow I cannot produce the result I want. Could somebody please give some light on this? Thank you in advance for the help.

S-Man · Accepted Answer

demo:db<>fiddle

WITH cte AS (
    SELECT
        CASE WHEN country IN ('FR','GER') THEN 'europe'
             WHEN country IN ('CHI', 'JAP') THEN 'asia'
        END AS region,
        genre_name
    FROM
        joined_table
)
SELECT
    region, genre_name
FROM (
    SELECT
        region,
        genre_name,
        row_number() OVER (PARTITION BY region ORDER BY COUNT(*) DESC)
    FROM cte
    GROUP BY region, genre_name
) s
WHERE row_number <= 2

The CTE is for transforming the countries into regions (Obviously, the used table is the substitution for the whole join through your structure)
Group by the region and genre_name and count the genres.
This COUNT can be used to generate a row_number() for each region, ordered by the descending count order to get the highest counts to the top. So the most interesting genres become the first records for each group
Filter the two highest records.

How to rank an aggregation in each newly formed group?

Answers (1)

Related Questions