Identifying Duplicate Values - Google BigQuery

Question

I'm simply trying to identify duplicate values within BigQuery.

My code looks like:

SELECT
  address,
  title_1,
  COUNT(*)
FROM
  `target.querytable`
GROUP BY
  1,2
HAVING
  COUNT (*) > 1

I'm trying to identify duplicate records in the title_1 field and select their corresponding url from the address column along with the sum of the duplication. Ideally the output would look like:

Mikhail Berlyant · Accepted Answer

Below is for BigQuery Standard SQL

#standardSQL
SELECT * FROM (
  SELECT *, COUNT(1) OVER(PARTITION BY title_1) dup_count
  FROM `target.querytable`
)
WHERE dup_count > 1

Identifying Duplicate Values - Google BigQuery

Answers (2)

Related Questions