Reputation: 53
I'm simply trying to identify duplicate values within BigQuery.
My code looks like:
SELECT
address,
title_1,
COUNT(*)
FROM
`target.querytable`
GROUP BY
1,2
HAVING
COUNT (*) > 1
I'm trying to identify duplicate records in the title_1 field and select their corresponding url from the address column along with the sum of the duplication. Ideally the output would look like:
Upvotes: 2
Views: 12178
Reputation: 172993
Below is for BigQuery Standard SQL
#standardSQL
SELECT * FROM (
SELECT *, COUNT(1) OVER(PARTITION BY title_1) dup_count
FROM `target.querytable`
)
WHERE dup_count > 1
Upvotes: 9
Reputation: 126
If you looking to identify the duplicates try to use analytic function
You can use ROW_NUMBER() over a partition of columns that should be unique for you, e.g: ROW_NUMBER() OVER (PARTITION BY COLUMN1, COLUMN2 ORDER BY COLUMN1). Every result that has a rownumber > 1 is a duplicate.
Upvotes: 2