Henu
Henu

Reputation: 1732

What is semi-join in database?

I am having trouble while trying to understand the concept of semi-join and how it is different from conventional join. I have tried some article already but not satisfied with the explanation, could someone please help me to understand it?

Upvotes: 67

Views: 87886

Answers (3)

Iurii Ant
Iurii Ant

Reputation: 1025

A semi-join is basically a subquery that is applied in WHERE clause.

Simple example. Let's select students with grades using INNER JOIN:

SELECT DISTINCT s.id
FROM students AS s
INNER JOIN grades AS g
    ON g.student_id = s.id

Now, the same with left semi-join:

SELECT s.id
FROM students AS s
WHERE EXISTS (SELECT 1 
              FROM grades AS g
              WHERE g.student_id = s.id)

The latter is generally more efficient (depending on concrete DBMS and query optimizer).

But note that after matching, INNER JOIN returns the result of both tables, whereas left semi-join returns only the left table records (that's where semi comes from). So the usage may depend on your desired output.

Upvotes: 73

Lukasz Szozda
Lukasz Szozda

Reputation: 175756

As far as I know SQL dialects that support SEMIJOIN/ANTISEMI are U-SQL/Cloudera Impala.

SEMIJOIN:

Semijoins are U-SQL’s way filter a rowset based on the inclusion of its rows in another rowset. Other SQL dialects express this with the SELECT * FROM A WHERE A.key IN (SELECT B.key FROM B) pattern.

More info Semi Join and Anti Join Should Have Their Own Syntax in SQL:

“Semi” means that we don’t really join the right hand side, we only check if a join would yield results for any given tuple.

-- IN
SELECT *
FROM Employee
WHERE DeptName IN (
  SELECT DeptName
  FROM Dept
)

-- EXISTS
SELECT *
FROM Employee
WHERE EXISTS (
  SELECT 1
  FROM Dept
  WHERE Employee.DeptName = Dept.DeptName
)

EDIT:

Another dialect that supports SEMI/ANTISEMI join is KQL:

kind=leftsemi (or kind=rightsemi)

Returns all the records from the left side that have matches from the right. The result table contains columns from the left side only.

let t1 = datatable(key:long, value:string)  
[1, "a",  
2, "b",
3, "c"];
let t2 = datatable(key:long)
[1,3];
t1 | join kind=leftsemi (t2) on key

demo

Output:

key  value
1    a
3    c

Upvotes: 26

Simon van Endern
Simon van Endern

Reputation: 163

As I understand, a semi join is a left join or right join:

What's the difference between INNER JOIN, LEFT JOIN, RIGHT JOIN and FULL JOIN?

So the difference between a left (semi) join and a "conventional" join is that you only retrieve the data of the left table (where you have a match on your join condition). Whereas with a full (outer) join (I think thats what you mean by conventional join), you retrieve the data of both tables where your condition matches.

Upvotes: -10

Related Questions