Royal
Royal

Reputation: 208

Using PostgreSQL Case When with Partition

I am trying to use Case when with partition to create a new row and mark whether its duplicated or not

My desired output is shown below

enter image description here

The query I am using

I cant find the error here. It seemed to work in SQL Server before

SELECT *,
    CASE 
        WHEN ROW_NUMBER() OVER (PARTITION BY 
            x , y > 1) THEN TRUE ELSE FALSE
    END AS is_duplicated
    FROM users
   

Upvotes: 1

Views: 2230

Answers (3)

Gordon Linoff
Gordon Linoff

Reputation: 1269953

As the two other answers point out, the > is not part of the partition by.

I want to point out first that using row_number() without order by is highly, highly discouraged. Without an order by, the result can be different on each run.

More importantly, you don't want row_number(). You want count() -- because you want TRUE on all rows where there are duplicates.

The second is that a case expression is not necessary at all, because Postgres has a boolean type.

So:

SELECT u.*,
       (COUNT(*) OVER (PARTITION BY x, y) > 1) AS is_duplicated
FROM users u;

Here is a db<>fiddle.

Upvotes: 1

Bart Hofland
Bart Hofland

Reputation: 3905

I guess you put a parenthesis wrong. Could you try CASE WHEN ROW_NUMBER() OVER (PARTITION BY x , y) > 1 THEN TRUE ELSE FALSE END?

By the way, in PostgreSQL, I guess you could write this as well:

SELECT
    *,
    (ROW_NUMBER() OVER (PARTITION BY x , y) > 1) AS is_duplicated
FROM
    users

Note that if you want to make sure that the correct rows are marked as duplicates, you might also want to include an ORDER BY in your OVER clause.

By the way, when testing this query in Microsoft SQL Server, I get an error message if I omit the ORDER BY in the OVER clause. It does work fine without an ORDER BY in PostgreSQL, however.

Upvotes: 0

Elbek
Elbek

Reputation: 646

I think > 1 should be out of (PARTITIONBY x, y) like this:

SELECT *,
    CASE 
        WHEN ROW_NUMBER() OVER (PARTITION BY 
            x , y) > 1 THEN TRUE ELSE FALSE
    END AS is_duplicated
    FROM users

Comprasion operator itself results on boolean, so you don't need case:

SELECT *,
     
       ROW_NUMBER() OVER (PARTITION BY 
            x , y) > 1
     AS is_duplicated
    FROM users

But this will result, marking duplicate only the rows after the first row, meaning first Carlo Thomas will not be duplicate.

So for the desired result, you need this:

SELECT *,
     
       (SELECT count(*) from users t WHERE t.x=u.x and t.y=u.y) > 1
     AS is_duplicated
    FROM users u;

Upvotes: 3

Related Questions