Reputation: 110227
I am trying to get the top 3 providers by revenue, and for each of those providers, the top 2 titles for that provider alphabetically. Here is the basic GROUP BY
query I have to start with --
SELECT
provider,
title,
SUM(customer_price)
FROM
[integrated-myth-156821:fintest.m10]
GROUP BY
title,
provider
LIMIT
25
Here is the result set I get:
And the table structure is provider
| title
| customer_price
.
Here is an illustration of the result I am looking to get:
provider title sum(revenue)
Sony 10000 // Top provider by revenue, limit 3
Sony Avatar 400 // First title alphabetically, limit 2
Sony Titanic 300
Fox 8000
Fox Fresh 12
Fox Mike 800
WB 4500
WB Harry 1000
WB Potter 1200
Upvotes: 2
Views: 3403
Reputation: 172993
Below solution is with BigQuery Standard SQL
#standardSQL
WITH topProviders AS (
SELECT provider, SUM(customer_price) AS revenue
FROM `integrated-myth-156821.fintest.m10`
GROUP BY provider
ORDER BY revenue DESC
LIMIT 3
),
titles AS (
SELECT provider, title, SUM(customer_price) AS revenue
FROM `integrated-myth-156821.fintest.m10`
GROUP BY provider, title
)
SELECT provider, title, revenue
FROM (
SELECT p.revenue AS sort,
STRUCT(p.provider, '' AS title, p.revenue ) AS providers,
ARRAY_AGG(STRUCT(t.provider, title, t.revenue) ORDER BY title LIMIT 2) AS titles
FROM topProviders AS p
JOIN titles AS t
ON p.provider = t.provider
GROUP BY p.provider, p.revenue
), UNNEST(ARRAY_CONCAT([providers], titles))
ORDER BY sort DESC, provider, title
Upvotes: 2
Reputation: 172993
Try below
#legacySQL
SELECT
provider, title, revenue
FROM
-- Top 2 (alphabetically) Titles for each of Top 3 Providers
(
SELECT
titles.provider AS provider,
titles.title AS title,
titles.revenue AS revenue,
top_providers.revenue AS pos
FROM (
SELECT provider, title, revenue, ROW_NUMBER() OVER(PARTITION BY provider ORDER BY title) AS pos
FROM (
SELECT provider, title, SUM(customer_price) AS revenue
FROM [integrated-myth-156821:fintest.m10]
GROUP BY provider, title
)
) AS titles
JOIN (
SELECT
provider,
SUM(customer_price) AS revenue
FROM [integrated-myth-156821:fintest.m10]
GROUP BY provider
ORDER BY revenue DESC
LIMIT 3 -- sets top 3 providers
) top_providers
ON top_providers.provider = titles.provider
WHERE pos < 3 -- set top two titles
),
-- Top 3 Providers by Revenue
(
SELECT
provider,
'' AS title,
SUM(customer_price) AS revenue,
SUM(customer_price) AS pos
FROM [integrated-myth-156821:fintest.m10]
GROUP BY provider
ORDER BY revenue DESC
LIMIT 3 -- sets top 3 providers
)
ORDER BY pos DESC, provider, title
could you please explain some of the query?
This query is actually a union of two queries
Second (on the very bottom of query) provides top 3 by revenue providers
The first one gives top 2 titles for each provider in top 3 - and by itself is join of top 3 providers with titles revenues
pos in below expression
ROW_NUMBER() OVER(PARTITION BY provider ORDER BY title) AS pos
allows then to filter in only 2 top titles with
WHERE pos < 3
Upvotes: 1