Deduplicate a partitioned table in BigQuery

Question

I'm trying to deduplicate an ingestion-time partitioned table in BigQuery:

MERGE dataset.table_name targ
USING (
  SELECT * EXCEPT(row_number)
  FROM (
    SELECT
      *,
      ROW_NUMBER() OVER (PARTITION BY uid ORDER BY _PARTITIONDATE DESC) row_number
    FROM dataset.table_name
  )
  WHERE row_number = 1
) src
ON FALSE
WHEN NOT MATCHED BY SOURCE THEN DELETE
WHEN NOT MATCHED THEN INSERT ROW

Getting the following error:

Omitting INSERT target column list is unsupported for ingestion-time partitioned table dataset.table_name

CREATE OR REPLACE TABLE can't create partitioned tables.
Creating a partitioned table using DDL and then inserting requires defining schema inside the query, which I'm trying to avoid.

I'm looking for a simple universal query that I can apply to different tables with minimal adjustment. Like the one above.

Tibic4 · Accepted Answer

You need to specify the columns in the INSERT statement. You can use the following query to get the column names:

MERGE dataset.table_name targ
USING (
  SELECT * EXCEPT(row_number)
  FROM (
    SELECT
      *,
      _PARTITIONTIME,
      ROW_NUMBER() OVER (PARTITION BY uid ORDER BY _PARTITIONTIME DESC) row_number
    FROM dataset.table_name
  )
  WHERE row_number = 1
) src
ON FALSE
WHEN NOT MATCHED BY SOURCE THEN DELETE
WHEN NOT MATCHED THEN INSERT (uid, _PARTITIONTIME, column1, column2, column3)
VALUES (uid, _PARTITIONTIME, column1, column2, column3)

But it's not universal.

Deduplicate a partitioned table in BigQuery

Answers (1)

Related Questions