Reputation: 1777
My current method of de-duping is really dumb.
select col1, col2 ... col500 from
(select col1, col2 ... col500, ROW_NUMBER() OVER(PARTITION BY uid) as row_num)
where row_num=1;
Is there a way to do this without a subquery? Select distinct is not an option as there can be small variations in the columns which are not significant for this output.
Upvotes: 0
Views: 102
Reputation:
In Postgres distinct on ()
is typically faster then the equivalent solution using a window function and also doesn't require a sub-query:
select distinct on (uuid) *
from the_table
order by something
You have to supply an order by (which is something you should have done with row_number()
as well) to get stable results - otherwise the chosen row is "random".
The above is true for Postgres. You also tagged your question with amazon-redshift
- I have no idea if Redshift (which is in fact a very different DBMS) supports the same thing nor if it is as efficient.
Upvotes: 1