Reputation: 327
I'm trying the new query scheduling feature in Google BigQuery, but I can't seem to get it to append new records to my table correctly.
I set Custom schedule
to every 15 minutes
and the Destination table write preference
to Append to table
.
SELECT DATETIME_TRUNC(DATETIME(logtime, 'America/Los_Angeles'), MINUTE) log_minute,
COUNT(DISTINCT user_id) users,
COUNT(DISTINCT product_id) unique_products
FROM mytable
WHERE DATE(logtime, 'America/Los_Angeles') >= "2019-05-01"
GROUP BY log_minute
ORDER BY log_minute
I expected to see 1 row per log_minute
, but I'm seeing duplicates: 1 row per log_minute
for each scheduled run so that after an hour, there are 5 duplicates of each row (1 at the start + 1 for every 15 minutes).
Upvotes: 0
Views: 3612
Reputation: 327
Thanks for the tip, Felipe! For anyone who's trying to do the same thing, I edited the query to the following:
MERGE nextvr.sched_test_15min H
USING
(
SELECT TIMESTAMP(DATETIME_TRUNC(DATETIME(logtime, 'America/Los_Angeles'), MINUTE)) log_minute,
COUNT(DISTINCT user_id) users,
COUNT(DISTINCT product_id) products
FROM mytable
WHERE DATE(logtime, 'America/Los_Angeles') >= "2019-05-01"
GROUP BY log_minute
) N
ON H.log_minute = N.log_minute
WHEN MATCHED THEN
UPDATE
SET users = N.users, products = N.products
WHEN NOT MATCHED THEN
INSERT (log_minute, users, products)
VALUES (log_minute, users, products)
When creating the scheduled query, under Destination for query results
section, leave the Table name
field blank.
Upvotes: 1
Reputation: 59225
I expected to see 1 row per log_minute, but I'm seeing duplicates: 1 row per log_minute for each scheduled run
Do you want to append new rows? Of course you'll see a new row every time the query runs - because you are appending rows.
If you want to UPDATE
the existing ones instead and add new ones, schedule a MERGE
.
Upvotes: 1