inaki
inaki

Reputation: 392

Calculate percentage of group using GROUP BY

I am doing a GROUP BY and COUNT(*) on a dataset, and I would like to calculate the percentage of each group over the total.

For example, in this query, I would like to know how much the count() for each state represents over the total ( select count() from publicdata:samples.natality ):

SELECT state, count(*)
FROM [publicdata:samples.natality]
GROUP by state

There are several ways to do it in SQL, but I haven't found a way to do it in Bigquery, does anyone know?

Thanks!

Upvotes: 25

Views: 35594

Answers (5)

dr.pas
dr.pas

Reputation: 23

When using Johnny V's solution, it returns frequencies for me. For calculating actual percentages I found that adding a simple *100 works:

SELECT
  sex
  ,COUNT(*) / (SUM(COUNT(*))OVER()) * 100 AS percentage
FROM `powerful-hall-355408.comic_characters_wikia.dc_comics` 
GROUP BY sex

Upvotes: 2

Johnny V
Johnny V

Reputation: 1238

You can use a window function to get the percentage of total by group, without the need for a subquery (improving on evan_b's solution):

SELECT 
   state
   ,count(*) / (sum(count(*)) OVER()) as pct
FROM  
   `bigquery-public-data.samples.natality` 
GROUP BY 
   state

Upvotes: 10

evan_b
evan_b

Reputation: 1239

Modifying Felipe's answer for the standard SQL BigQuery dialect instead of the Legacy SQL dialect looks like this:

select state, 100*(state_count / total) as pct
from (
  SELECT state, count(*) AS state_count, sum(count(*)) OVER() AS total
  FROM `bigquery-public-data.samples.natality` 
  GROUP by state
) s

Documentation of the standard SQL BigQuery aggregate analytic functions (aka 'window functions') is here: https://cloud.google.com/bigquery/docs/reference/standard-sql/analytic-function-concepts

Upvotes: 21

Felipe Hoffa
Felipe Hoffa

Reputation: 59175

Check ratio_to_report, one of the recently announced window functions:

SELECT state, ratio * 100 AS percent FROM (
 SELECT state, count(*) AS total, RATIO_TO_REPORT(total) OVER() AS ratio
 FROM [publicdata:samples.natality]
 GROUP by state
)

state   percent
AL      1.4201828131159113   
AK      0.23521048665998198  
AZ      1.3332896746620975   
AR      0.7709591206172346   
CA      10.008298605982642

Upvotes: 18

Jordan Tigani
Jordan Tigani

Reputation: 26617

You can do a self join against the total, using a dummy value as a key. For example:

SELECT
  t1.state AS state,
  t1.cnt AS cnt,
  100 * t1.cnt / t2.total as percent
FROM (
  SELECT
    state,
    COUNT(*) AS cnt,
    1 AS key
  FROM
    [publicdata:samples.natality]
  WHERE state is not null
  GROUP BY
    state) AS t1
JOIN (
  SELECT
    COUNT(*) AS total,
    1 AS key
  FROM
    [publicdata:samples.natality]) AS t2
ON t1.key = t2.key
ORDER BY percent DESC

Upvotes: 5

Related Questions