binbin
binbin

Reputation: 177

Query with UNION or OR?

I have a table friend used to store the relationship between two user.

For example: (1,2) means user1 and user2 are friends. (2,1) means the same, but there we won't store that, making uid1 < uid2 manually:

CREATE TABLE public.friend (
  uid1 INTEGER,
  uid2 INTEGER
);
CREATE INDEX index_uid1 ON friend USING BTREE (uid1);
CREATE INDEX index_uid2 ON friend USING BTREE (uid2);

To find the friends of uid=2, I can use:

Sql1:

select * from friend where uid1=2
union
select * from friend where uid2=2;

Sql2:

select * from friend uid1=2 or uid2=2

What I get is that sql2 is better than sql1 in performance.

But sql1 is recommended. Is that correct?

Upvotes: 1

Views: 116

Answers (3)

Erwin Brandstetter
Erwin Brandstetter

Reputation: 656854

I have a table friend used to store the relationship between two user.

For example: (1,2) means user1 and user2 are friends. (2,1) means the same, but there we won't store that, making uid1 < uid2 manually:

Typically, you would implement that with have a PRIMARY KEY on (udi1, uid2) and a CHECK constraint enforcing uid1 < uid2.

CREATE TABLE public.friend (
   uid1 integer
 , uid2 integer
 , PRIMARY KEY (uid1, uid2)
 , CONSTRAINT uid2_gt_uid1 CHECK (uid1 < uid2)
);

CREATE INDEX index_uid2 ON friend USING BTREE (uid2);

You don't need the other index, it's covered by the index of the PK;

CREATE INDEX index_uid1 ON friend USING BTREE (uid1);

Then there cannot be duplicates (including switched duplicates) and nobody can be friend with him/her self either, and your query can simply be:

SELECT * FROM friend WHERE 2 IN (uid1, uid2);

... which is short for:

SELECT * FROM friend WHERE uid1 = 2 OR uid2 = 2;

And the UNION variant is now logically identical:

SELECT * FROM friend WHERE uid1=2
UNION
SELECT * FROM friend WHERE uid2=2;

But you would UNION ALL instead of just UNION as there are no duplicates to begin with and UNION ALL is cheaper. But still slightly more expensive than the single SELECT above.

Duplicates?

There are three possible sources of duplicates in the UNION ALL query:

  1. Duplicate rows in the underlying table (ruled out by the PK).
  2. Rows fetched multiple times by more than one SELECT branch.
  3. In your particular case: logical duplicates with switched IDs (ruled out by the CHECK constraint).

Once you understand this, you also understand the implications of the different query techniques. With the suggest setup, only 2. remains as possible issue.

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1269883

union incurs overhead for removing duplicates. Perhaps the most efficient method for the query is:

select f.* from friend f where uid1 = 2
union all
select f.* from friend f where uid2 = 2 and uid1 <> 2;

In particular, this can take advantage of indexes on f(uid1) and f(uid2). Your second version is probably doing a full table scan.

Upvotes: 1

Andrew
Andrew

Reputation: 27294

They are not technically the same, the union operator will remove duplicates, while the 2nd example does not.

Upvotes: 2

Related Questions