Reputation: 208
I am trying to use Case when
with partition to create a new row and mark whether its duplicated or not
My desired output is shown below
The query I am using
I cant find the error here. It seemed to work in SQL Server before
SELECT *,
CASE
WHEN ROW_NUMBER() OVER (PARTITION BY
x , y > 1) THEN TRUE ELSE FALSE
END AS is_duplicated
FROM users
Upvotes: 1
Views: 2230
Reputation: 1269953
As the two other answers point out, the >
is not part of the partition by
.
I want to point out first that using row_number()
without order by
is highly, highly discouraged. Without an order by, the result can be different on each run.
More importantly, you don't want row_number()
. You want count()
-- because you want TRUE
on all rows where there are duplicates.
The second is that a case
expression is not necessary at all, because Postgres has a boolean type.
So:
SELECT u.*,
(COUNT(*) OVER (PARTITION BY x, y) > 1) AS is_duplicated
FROM users u;
Here is a db<>fiddle.
Upvotes: 1
Reputation: 3905
I guess you put a parenthesis wrong. Could you try CASE WHEN ROW_NUMBER() OVER (PARTITION BY x , y) > 1 THEN TRUE ELSE FALSE END
?
By the way, in PostgreSQL, I guess you could write this as well:
SELECT
*,
(ROW_NUMBER() OVER (PARTITION BY x , y) > 1) AS is_duplicated
FROM
users
Note that if you want to make sure that the correct rows are marked as duplicates, you might also want to include an ORDER BY
in your OVER
clause.
By the way, when testing this query in Microsoft SQL Server, I get an error message if I omit the ORDER BY
in the OVER
clause. It does work fine without an ORDER BY
in PostgreSQL, however.
Upvotes: 0
Reputation: 646
I think > 1
should be out of (PARTITIONBY x, y)
like this:
SELECT *,
CASE
WHEN ROW_NUMBER() OVER (PARTITION BY
x , y) > 1 THEN TRUE ELSE FALSE
END AS is_duplicated
FROM users
Comprasion operator itself results on boolean, so you don't need case:
SELECT *,
ROW_NUMBER() OVER (PARTITION BY
x , y) > 1
AS is_duplicated
FROM users
But this will result, marking duplicate only the rows after the first row, meaning first Carlo Thomas
will not be duplicate.
So for the desired result, you need this:
SELECT *,
(SELECT count(*) from users t WHERE t.x=u.x and t.y=u.y) > 1
AS is_duplicated
FROM users u;
Upvotes: 3