ForeverConfused
ForeverConfused

Reputation: 1777

How do I select only 1 record per user id using ROW_NUMBER without a subquery?

My current method of de-duping is really dumb.

select col1, col2 ... col500 from 
(select col1, col2 ... col500, ROW_NUMBER() OVER(PARTITION BY uid) as row_num)
where row_num=1;

Is there a way to do this without a subquery? Select distinct is not an option as there can be small variations in the columns which are not significant for this output.

Upvotes: 0

Views: 102

Answers (1)

user330315
user330315

Reputation:

In Postgres distinct on () is typically faster then the equivalent solution using a window function and also doesn't require a sub-query:

select distinct on (uuid) *
from the_table
order by something

You have to supply an order by (which is something you should have done with row_number() as well) to get stable results - otherwise the chosen row is "random".


The above is true for Postgres. You also tagged your question with amazon-redshift - I have no idea if Redshift (which is in fact a very different DBMS) supports the same thing nor if it is as efficient.

Upvotes: 1

Related Questions