Reputation: 392
I am doing a GROUP BY and COUNT(*) on a dataset, and I would like to calculate the percentage of each group over the total.
For example, in this query, I would like to know how much the count() for each state represents over the total ( select count() from publicdata:samples.natality ):
SELECT state, count(*)
FROM [publicdata:samples.natality]
GROUP by state
There are several ways to do it in SQL, but I haven't found a way to do it in Bigquery, does anyone know?
Thanks!
Upvotes: 25
Views: 35594
Reputation: 23
When using Johnny V's solution, it returns frequencies for me. For calculating actual percentages I found that adding a simple *100
works:
SELECT
sex
,COUNT(*) / (SUM(COUNT(*))OVER()) * 100 AS percentage
FROM `powerful-hall-355408.comic_characters_wikia.dc_comics`
GROUP BY sex
Upvotes: 2
Reputation: 1238
You can use a window function to get the percentage of total by group, without the need for a subquery (improving on evan_b's solution):
SELECT
state
,count(*) / (sum(count(*)) OVER()) as pct
FROM
`bigquery-public-data.samples.natality`
GROUP BY
state
Upvotes: 10
Reputation: 1239
Modifying Felipe's answer for the standard SQL BigQuery dialect instead of the Legacy SQL dialect looks like this:
select state, 100*(state_count / total) as pct
from (
SELECT state, count(*) AS state_count, sum(count(*)) OVER() AS total
FROM `bigquery-public-data.samples.natality`
GROUP by state
) s
Documentation of the standard SQL BigQuery aggregate analytic functions (aka 'window functions') is here: https://cloud.google.com/bigquery/docs/reference/standard-sql/analytic-function-concepts
Upvotes: 21
Reputation: 59175
Check ratio_to_report, one of the recently announced window functions:
SELECT state, ratio * 100 AS percent FROM (
SELECT state, count(*) AS total, RATIO_TO_REPORT(total) OVER() AS ratio
FROM [publicdata:samples.natality]
GROUP by state
)
state percent
AL 1.4201828131159113
AK 0.23521048665998198
AZ 1.3332896746620975
AR 0.7709591206172346
CA 10.008298605982642
Upvotes: 18
Reputation: 26617
You can do a self join against the total, using a dummy value as a key. For example:
SELECT
t1.state AS state,
t1.cnt AS cnt,
100 * t1.cnt / t2.total as percent
FROM (
SELECT
state,
COUNT(*) AS cnt,
1 AS key
FROM
[publicdata:samples.natality]
WHERE state is not null
GROUP BY
state) AS t1
JOIN (
SELECT
COUNT(*) AS total,
1 AS key
FROM
[publicdata:samples.natality]) AS t2
ON t1.key = t2.key
ORDER BY percent DESC
Upvotes: 5