How do I select only 1 record per user id using ROW_NUMBER without a subquery?

Question

My current method of de-duping is really dumb.

select col1, col2 ... col500 from 
(select col1, col2 ... col500, ROW_NUMBER() OVER(PARTITION BY uid) as row_num)
where row_num=1;

Is there a way to do this without a subquery? Select distinct is not an option as there can be small variations in the columns which are not significant for this output.

user330315 · Accepted Answer

In Postgres distinct on () is typically faster then the equivalent solution using a window function and also doesn't require a sub-query:

select distinct on (uuid) *
from the_table
order by something

You have to supply an order by (which is something you should have done with row_number() as well) to get stable results - otherwise the chosen row is "random".

The above is true for Postgres. You also tagged your question with amazon-redshift - I have no idea if Redshift (which is in fact a very different DBMS) supports the same thing nor if it is as efficient.

How do I select only 1 record per user id using ROW_NUMBER without a subquery?

Answers (1)

Related Questions