Jordan Lowry
Jordan Lowry

Reputation: 53

Identifying Duplicate Values - Google BigQuery

I'm simply trying to identify duplicate values within BigQuery.

My code looks like:

SELECT
  address,
  title_1,
  COUNT(*)
FROM
  `target.querytable`
GROUP BY
  1,2
HAVING
  COUNT (*) > 1

I'm trying to identify duplicate records in the title_1 field and select their corresponding url from the address column along with the sum of the duplication. Ideally the output would look like:

enter image description here

Upvotes: 2

Views: 12178

Answers (2)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 172993

Below is for BigQuery Standard SQL

#standardSQL
SELECT * FROM (
  SELECT *, COUNT(1) OVER(PARTITION BY title_1) dup_count
  FROM `target.querytable`
)
WHERE dup_count > 1

Upvotes: 9

Never_Give_Up
Never_Give_Up

Reputation: 126

If you looking to identify the duplicates try to use analytic function

You can use ROW_NUMBER() over a partition of columns that should be unique for you, e.g: ROW_NUMBER() OVER (PARTITION BY COLUMN1, COLUMN2 ORDER BY COLUMN1). Every result that has a rownumber > 1 is a duplicate.

Upvotes: 2

Related Questions