stkvtflw
stkvtflw

Reputation: 13507

Deduplicate a partitioned table in BigQuery

I'm trying to deduplicate an ingestion-time partitioned table in BigQuery:

MERGE dataset.table_name targ
USING (
  SELECT * EXCEPT(row_number)
  FROM (
    SELECT
      *,
      ROW_NUMBER() OVER (PARTITION BY uid ORDER BY _PARTITIONDATE DESC) row_number
    FROM dataset.table_name
  )
  WHERE row_number = 1
) src
ON FALSE
WHEN NOT MATCHED BY SOURCE THEN DELETE
WHEN NOT MATCHED THEN INSERT ROW

Getting the following error:

Omitting INSERT target column list is unsupported for ingestion-time partitioned table dataset.table_name
  1. CREATE OR REPLACE TABLE can't create partitioned tables.
  2. Creating a partitioned table using DDL and then inserting requires defining schema inside the query, which I'm trying to avoid.

I'm looking for a simple universal query that I can apply to different tables with minimal adjustment. Like the one above.

Upvotes: 0

Views: 322

Answers (1)

Tibic4
Tibic4

Reputation: 3767

You need to specify the columns in the INSERT statement. You can use the following query to get the column names:

MERGE dataset.table_name targ
USING (
  SELECT * EXCEPT(row_number)
  FROM (
    SELECT
      *,
      _PARTITIONTIME,
      ROW_NUMBER() OVER (PARTITION BY uid ORDER BY _PARTITIONTIME DESC) row_number
    FROM dataset.table_name
  )
  WHERE row_number = 1
) src
ON FALSE
WHEN NOT MATCHED BY SOURCE THEN DELETE
WHEN NOT MATCHED THEN INSERT (uid, _PARTITIONTIME, column1, column2, column3)
VALUES (uid, _PARTITIONTIME, column1, column2, column3) 

But it's not universal.

Upvotes: 2

Related Questions