Tai
Tai

Reputation: 167

Select highest count per day in BigQuery

I'm trying to write a query that will return the entry with the highest count for each day across a range of BigQuery tables.

I've only been able to get as far as writing the following query, which returns all of entries and their counts for each day, sorted by day and then by the products with the highest entries from highest to lowest.

SELECT 
   STRFTIME_UTC_USEC(UTC_USEC_TO_day((ts-25200000)*1000),"%Y-%m-%d") AS day,
   products.id as product, 
   count(products.id) as num_entries
FROM 
   TABLE_DATE_RANGE([table_name_], timestamp('20170801'), timestamp(current_date()))
GROUP BY day, product
ORDER BY day, num_entries desc

e.g.

2017-08-01 . product A . 10
2017-08-01 . product B . 8
2017-08-01 . product C . 4
2017-08-01 . product D . 2
2017-08-02 . product X . 18
2017-08-02 . product Y . 15
2017-08-02 . product Z . 11
2017-08-03 . product N . 20
2017-08-03 . product M . 12
2017-08-03 . product N . 5
2017-08-03 . product O . 3
...

How could I change the query to return only the top entry (highest num_entries) for each day?

e.g.

2017-08-01 . product A . 10
2017-08-02 . product X . 18
2017-08-03 . product N . 20
...

Upvotes: 0

Views: 3140

Answers (2)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 172994

If for some reason you still need this for BigQuery Legacy SQL - use below
Just wraps your original query with little extra logic

#legacySQL
SELECT 
  day, 
  product, 
  num_entries
FROM (
  SELECT 
    day, 
    product, 
    num_entries, 
    ROW_NUMBER() OVER(PARTITION BY day ORDER BY num_entries DESC) AS win
  FROM (
  -- your original query START
    SELECT 
       STRFTIME_UTC_USEC(UTC_USEC_TO_day((ts-25200000)*1000),"%Y-%m-%d") AS day,
       products.id as product, 
       COUNT(products.id) as num_entries
    FROM 
       TABLE_DATE_RANGE([table_name_], TIMESTAMP('20170801'), TIMESTAMP(CURRENT_DATE()))
    GROUP BY day, product        )
  -- your original query END
)
WHERE win = 1  

Meantime, consider migrating to BigQuery Standard SQL
In you case the query would be something like below

#standardSQL
WITH days AS (
  SELECT
    PARSE_DATE('%Y%m%d', _TABLE_SUFFIX) AS day,
    products.id AS product,
    COUNT(*) AS num_entries
  FROM `table_name_*`
  WHERE _TABLE_SUFFIX >= '20170801'
  GROUP BY day, product
)
SELECT top.* FROM (
  SELECT ARRAY_AGG(days ORDER BY num_entries DESC LIMIT 1)[OFFSET(0)] AS top
  FROM days
  GROUP BY day
) 

you can start playing with simplified query with dummy data from your question

#standardSQL
WITH days AS (
  SELECT '2017-08-01' AS day, 'product A' AS product, 10 AS num_entries UNION ALL
  SELECT '2017-08-01', 'product B', 8 UNION ALL
  SELECT '2017-08-01', 'product C', 4 UNION ALL
  SELECT '2017-08-01', 'product D', 2 UNION ALL
  SELECT '2017-08-02', 'product X', 18 UNION ALL
  SELECT '2017-08-02', 'product Y', 15 UNION ALL
  SELECT '2017-08-02', 'product Z', 11 UNION ALL
  SELECT '2017-08-03', 'product N', 20 UNION ALL
  SELECT '2017-08-03', 'product M', 12 UNION ALL
  SELECT '2017-08-03', 'product N', 5 UNION ALL
  SELECT '2017-08-03', 'product O', 3
)
SELECT top.* FROM (
  SELECT ARRAY_AGG(days ORDER BY num_entries DESC LIMIT 1)[OFFSET(0)] AS top
  FROM days
  GROUP BY day
)

result is as expected:

Row day         product     num_entries  
1   2017-08-01  product A   10   
2   2017-08-03  product N   20   
3   2017-08-02  product X   18   

Upvotes: 3

Elliott Brossard
Elliott Brossard

Reputation: 33725

This should work, but note that you should use standard SQL for the query:

#standardSQL
WITH ProductCounts AS (
  SELECT
    PARSE_DATE('%Y%m%d', _TABLE_SUFFIX) AS date,
    products.id AS product,
    COUNT(*) AS num_entries
  FROM `your_table_*`
  WHERE _TABLE_SUFFIX >= '20170801'
  GROUP BY date, product
)
SELECT
  date,
  ARRAY_AGG(product ORDER BY num_entries DESC LIMIT 1)[OFFSET(0)] AS top_product
FROM ProductCounts
GROUP BY date;

Upvotes: 2

Related Questions