Keith Yong
Keith Yong

Reputation: 1494

Query conditionally return only one row per distinct id

I am making a Reddit clone and I'm having trouble querying my list of posts, given a logged in user, that shows whether or not logged in user upvoted the post for every post. I made a small example to make things simpler.

I am trying to return only one row per distinct post_id, but prioritize the upvoted column to be t > f > null.

For this example data:

> select * from post;
 id
----
  1
  2
  3

> select * from users;
 id
----
  1
  2

> select * from upvoted;
 user_id | post_id
---------+---------
       1 |       1
       2 |       1

If I am given user_id = 1 I want my query to return:

 postid | user_upvoted
--------+--------------
      1 | t
      2 | f
      3 | f

Since user1 upvoted post1, upvoted is t. Since user1 did not upvote post2, upvoted is f. Same for post3.

Schema

CREATE TABLE IF NOT EXISTS post (
    id              bigserial,
    PRIMARY KEY (id)
);

CREATE TABLE IF NOT EXISTS users (
    id              serial,
    PRIMARY KEY (id)
);

CREATE TABLE IF NOT EXISTS upvoted (
    user_id         integer
        REFERENCES users(id)
        ON DELETE CASCADE ON UPDATE CASCADE,
    post_id         bigint
        REFERENCES post(id)
        ON DELETE CASCADE ON UPDATE CASCADE,
    PRIMARY KEY (user_id, post_id)
);

What I tried so far

SELECT post.id as postid,
    CASE WHEN user_id=1 THEN true ELSE false END as user_upvoted
    FROM post LEFT OUTER JOIN upvoted
    ON post_id = post.id;

Which gives me:

 postid | user_upvoted
--------+--------------
      1 | t
      1 | f
      2 | f
      3 | f

Due to the join, there are two "duplicate" rows that result from the query. I want to priority the row with t > f > null. So I want to keep the 1 | t row.

Full script with schema+data.

Upvotes: 3

Views: 1145

Answers (3)

wildplasser
wildplasser

Reputation: 44250

The exists() operator yields a boolean value:

SELECT p.id
   , EXISTS (SELECT * FROM upvoted x
              WHERE x.post_id = p.id
                AND x.user_id = 1) AS it_was_upvoted_by_user1
FROM post p
        ;

Upvotes: 1

Erwin Brandstetter
Erwin Brandstetter

Reputation: 656714

Since the combination (user_id, post_id) is defined unique in upvoted (PRIMARY KEY), this can be much simpler:

SELECT p.id AS post_id, u.post_id IS NOT NULL AS user_upvoted
FROM   post p
LEFT   JOIN upvoted u ON u.post_id = p.id
                     AND u.user_id = 1;

Simply add user_id = 1 to the join condition. Makes perfect use of the index and should be simplest and fastest.

You also mention NULL, but there are only two distinct states in the result: true / false.

Alternative approach

On second thought, you might be complicating a very basic task. If you are only interested in posts the current user upvoted, use this simple query instead:

SELECT post_id FROM upvoted WHERE user_id = 1;

All other posts are not upvoted by the given user. It would seem we don't have to list those explicitly.

SQL Fiddle.

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1269773

You should be able to do this with distinct on:

SELECT distinct on (p.id) p.id as postid,
       (CASE WHEN user_id = 1 THEN true ELSE false END) as upvoted
FROM post p LEFT OUTER JOIN
     upvoted u
     ON u.post_id = p.id
ORDER BY p.id, upvoted desc;

Upvotes: 4

Related Questions