doing a group by in google Bigquery

Question

Here is my BigQuery

SELECT word,word_count,corpus_date FROM 
[publicdata:samples.shakespeare] 
WHERE word="the" ORDER BY word_count asc

which gives output as

    Row word    word_count corpus_date   
    1   the       57       1609  
    2   the       106      0     
    3   the       287      1609  
    4   the       353      1594  
    5   the       363      0     
    6   the       399      1592  
    7   the       421      1611

I want the data to be group by corpus_date.I tried using a group by corpus_date

    SELECT word,word_count,corpus_date FROM 
   [publicdata:samples.shakespeare] 
    WHERE word="the" group by corpus_date 
    ORDER BY word_count asc

but it did'nt allow me to do a group by corpus_date.Any way to get data grouped by corpus_date

Michael Manoochehri · Accepted Answer

You'll need to GROUP BY all non aggregated values in your query. However, since you are simply looking for a single word, you don't need to show or even GROUP BY that word in the result set (it's implicitly selected using the word="the" clause).

Therefore, if you want the total sum of word counts for the word "the" grouped by date, you can run something like this:

SELECT
  SUM(word_count) as sum_for_the,
  corpus_date
FROM
  [publicdata:samples.shakespeare]
WHERE
  word="the"
GROUP BY
  corpus_date
ORDER BY
  sum_for_the ASC;

That's not super useful on it's own... so if you want to do something more involved, such as learn which corpus the count per date comes from, SUM the counts of the word and list the corpora using a query like this:

SELECT
  SUM(word_count) AS sum_for_the, corpus, corpus_date
FROM
  [publicdata:samples.shakespeare]
WHERE
  word="the"
GROUP BY
  corpus_date, corpus
ORDER BY
  sum_for_the ASC;

For listing all volumes that a word appeared in per year, I like to use the GROUP_CONCAT function. The word "the" appears in everything, so it's probably not as interesting as a less common word, like "swagger." (Which is one of many words invented by Shakespeare).

SELECT
  SUM(word_count) AS word_sum, GROUP_CONCAT(corpus) as corpora, corpus_date
FROM
  [publicdata:samples.shakespeare]
WHERE
  word="swagger"
GROUP BY
  corpus_date ORDER BY corpus_date ASC;

Even more fun is to look at word prefixes, and GROUP BY variations of a word per volume and date:

SELECT
  word, SUM(word_count) AS word_sum, GROUP_CONCAT(corpus) as corpora, corpus_date
FROM
  [publicdata:samples.shakespeare]
WHERE
  word CONTAINS "swagger"
GROUP BY
  word, corpus_date
ORDER BY
  corpus_date ASC
IGNORE CASE;

Check out the BigQuery Query Language reference and the BigQuery Cookbook for more examples.

doing a group by in google Bigquery

Answers (1)

Related Questions