Reputation: 129
There are millions of record in table. And need to calculate number of duplicate rows present in my table in Redshift. I could achieve it by using below query,
select
sum(cnt) from (select <primary_key>
, count(*)-1 as cnt
from
table_name
group by
<primary_key> having count(*)>1
Thanks.
Upvotes: 3
Views: 3528
Reputation: 4208
If the criteria of duplication is only repeating primary key then
SELECT count(1)-count(distinct <primary_key>) FROM your_table
would work, except if you have specified your column as primary key in Redshift (it doesn't enforce constraint but if you mark a column as primary key count(distinct <primary_key>)
will return the same as count(1)
even if there are duplicate values in this column
Upvotes: 2
Reputation: 534
You can try the following query:
SELECT Column_name, COUNT(*) Count_Duplicate
FROM Table_name
GROUP BY Column_name
HAVING COUNT(*) > 1
ORDER BY COUNT(*) DESC
Upvotes: 4