Reputation: 167
I'm trying to write a query that will return the entry with the highest count for each day across a range of BigQuery tables.
I've only been able to get as far as writing the following query, which returns all of entries and their counts for each day, sorted by day and then by the products with the highest entries from highest to lowest.
SELECT
STRFTIME_UTC_USEC(UTC_USEC_TO_day((ts-25200000)*1000),"%Y-%m-%d") AS day,
products.id as product,
count(products.id) as num_entries
FROM
TABLE_DATE_RANGE([table_name_], timestamp('20170801'), timestamp(current_date()))
GROUP BY day, product
ORDER BY day, num_entries desc
e.g.
2017-08-01 . product A . 10
2017-08-01 . product B . 8
2017-08-01 . product C . 4
2017-08-01 . product D . 2
2017-08-02 . product X . 18
2017-08-02 . product Y . 15
2017-08-02 . product Z . 11
2017-08-03 . product N . 20
2017-08-03 . product M . 12
2017-08-03 . product N . 5
2017-08-03 . product O . 3
...
How could I change the query to return only the top entry (highest num_entries) for each day?
e.g.
2017-08-01 . product A . 10
2017-08-02 . product X . 18
2017-08-03 . product N . 20
...
Upvotes: 0
Views: 3140
Reputation: 172994
If for some reason you still need this for BigQuery Legacy SQL - use below
Just wraps your original query with little extra logic
#legacySQL
SELECT
day,
product,
num_entries
FROM (
SELECT
day,
product,
num_entries,
ROW_NUMBER() OVER(PARTITION BY day ORDER BY num_entries DESC) AS win
FROM (
-- your original query START
SELECT
STRFTIME_UTC_USEC(UTC_USEC_TO_day((ts-25200000)*1000),"%Y-%m-%d") AS day,
products.id as product,
COUNT(products.id) as num_entries
FROM
TABLE_DATE_RANGE([table_name_], TIMESTAMP('20170801'), TIMESTAMP(CURRENT_DATE()))
GROUP BY day, product )
-- your original query END
)
WHERE win = 1
Meantime, consider migrating to BigQuery Standard SQL
In you case the query would be something like below
#standardSQL
WITH days AS (
SELECT
PARSE_DATE('%Y%m%d', _TABLE_SUFFIX) AS day,
products.id AS product,
COUNT(*) AS num_entries
FROM `table_name_*`
WHERE _TABLE_SUFFIX >= '20170801'
GROUP BY day, product
)
SELECT top.* FROM (
SELECT ARRAY_AGG(days ORDER BY num_entries DESC LIMIT 1)[OFFSET(0)] AS top
FROM days
GROUP BY day
)
you can start playing with simplified query with dummy data from your question
#standardSQL
WITH days AS (
SELECT '2017-08-01' AS day, 'product A' AS product, 10 AS num_entries UNION ALL
SELECT '2017-08-01', 'product B', 8 UNION ALL
SELECT '2017-08-01', 'product C', 4 UNION ALL
SELECT '2017-08-01', 'product D', 2 UNION ALL
SELECT '2017-08-02', 'product X', 18 UNION ALL
SELECT '2017-08-02', 'product Y', 15 UNION ALL
SELECT '2017-08-02', 'product Z', 11 UNION ALL
SELECT '2017-08-03', 'product N', 20 UNION ALL
SELECT '2017-08-03', 'product M', 12 UNION ALL
SELECT '2017-08-03', 'product N', 5 UNION ALL
SELECT '2017-08-03', 'product O', 3
)
SELECT top.* FROM (
SELECT ARRAY_AGG(days ORDER BY num_entries DESC LIMIT 1)[OFFSET(0)] AS top
FROM days
GROUP BY day
)
result is as expected:
Row day product num_entries
1 2017-08-01 product A 10
2 2017-08-03 product N 20
3 2017-08-02 product X 18
Upvotes: 3
Reputation: 33725
This should work, but note that you should use standard SQL for the query:
#standardSQL
WITH ProductCounts AS (
SELECT
PARSE_DATE('%Y%m%d', _TABLE_SUFFIX) AS date,
products.id AS product,
COUNT(*) AS num_entries
FROM `your_table_*`
WHERE _TABLE_SUFFIX >= '20170801'
GROUP BY date, product
)
SELECT
date,
ARRAY_AGG(product ORDER BY num_entries DESC LIMIT 1)[OFFSET(0)] AS top_product
FROM ProductCounts
GROUP BY date;
Upvotes: 2