Reputation: 501

Relational division SQL

I programmed a filter which generates a Query to show special employees.

I have table employees and a lot of 1:1, 1:n and n:m relationships e.g. for skills and languages for the employees like this:

Employees
id name
1  John
2  Mike

Skills
id skill experience
1  PHP   3
2  SQL   1

Employee_Skills
eid sid
1   1
1   2

Now I want to filter employees which have at least 2 years experience in using PHP and 1 year SQL.

My filter always generates a correct working Query for every table, relationship and field.

But now my problem is when I would like to filter the same field in a related table multiple times with a and it does not work.

e.g.
John PHP 3
John SQL 1

PHP and SQL are different rows so AND can not work.

I tried using group_concat and find_in_set but I have the problem that I can not filter experience over 2 years with find_in_set and find_in_set does not know PHP is 3 and SQL is 1.

I also tried

WHERE emp.id IN (SELECT eid FROM Employee_Skills WHERE sid IN (SELECT id FROM Skills WHERE skill = 'PHP' AND experience > 1)) AND emp.id IN (SELECT eid FROM Employee_Skills WHERE sid IN (SELECT id FROM Skills WHERE skill = 'SQL' AND experience > 0))

which works for this example, but it only works for n:m and it too complex to know the relationship type.

I have the final Query with

ski.skill = 'PHP' AND ski.experience > 1 AND ski.skill = 'SQL' AND ski.experience > 0

and I would like to manipulate the Query to make it work.

How does a Query have to look like to deal with relational division.

Upvotes: 1

Answers (3)

LSerni

Reputation: 57408

The straightforward way would be to repeatedly JOIN the skills:

SELECT e.*
    FROM Employees AS e

    JOIN Employee_Skills AS j1 ON (e.id = j1.eid)
    JOIN Skills AS s1 ON (j1.sid = s1.id AND s1.skill = 'PHP' AND s1.experience > 3)

    JOIN Employee_Skills AS j2 ON (e.id = j2.eid)
    JOIN Skills AS s2 ON (j2.sid = s2.id AND s2.skill = 'SQL' AND s2.experience > 1)

    ...

Since all the clauses are required this translated to a straight JOIN.

You will need to add two JOINs for each clause, but they're quite fast joins.

A more hackish way would be to compress the skills into a code in a 1:1 relation with the employees. If experience never exceeds, say, 30, then you can multiply the first condition's experience by 1, the second by 30, the third by 30*30, the fourth by 30*30*30... and never get an overflow.

SELECT eid, SUM(CASE skill 
                      WHEN 'PHP' THEN 30*experience
                      WHEN 'SQL' THEN  1*experience) AS code
FROM Employees_Skills JOIN Skills ON (Skills.id = Employees_Skills.sid)
GROUP BY eid HAVING code > 0;

Actually since you want 3 years PHP, you can HAVE code > 91. If you had three conditions with experiences 2, 3 and 5, you would request more than x = 2*30*30 + 3*30 + 5. This only serves to whittle the results, since 3*30*30 + 2*30 + 4 still passes the filter but is of no use to you. But since you want a restriction on code, and "> x" costs the same as "> 0" and gives better results... (if you needed more complex filtering than a series of AND, > 0 is safer, though).

The table above you join with Employees, then on the result you perform the true filtering, requiring

((code / 30*30) % 30) > 7  // for instance :-)
AND
((code / 30) % 30) > 3     // for PHP
AND
((code /  1) % 30) > 1     // for SQL

(the *1 and /1 are superfluous, and only inserted to clarify)

This solution requires a full table scan on Skills, with no real possibility of automatic optimizations. So it is slower than the other solution. On the other hand, its cost grows much more slowly, so if you have complex queries, or need OR operators or conditional expressions instead of ANDs, it may be more convenient to implement the "hackish" solution.

Upvotes: 0

trincot

Reputation: 350345

You could make a kind of pivot query, where you put the experience in each of all of the known skills in columns. This could be a long query, but you could build it dynamically in php, so it would add all skills as columns to the final query, which would look like this:

SELECT     e.*, php_exp, sql_exp
FROM       Employee e
INNER JOIN (
        SELECT     es.eid,
                   SUM(CASE s.skill WHEN 'PHP' THEN s.experience END) php_exp,
                   SUM(CASE s.skill WHEN 'SQL' THEN s.experience END) sql_exp,
                   SUM(CASE s.skill WHEN 'JS'  THEN s.experience END)  js_exp
                   -- do the same for other skills here --
        FROM       Employee_Skills es
        INNER JOIN Skills s ON  es.sid = s.id
        GROUP BY   es.eid
        ) pivot ON pivot.eid = e.id
WHERE       php_exp > 2 AND sql_exp > 0;

The WHERE clause is then very concise and intuitive: you use the logical operators like in other circumstances.

If the set of skills is rather static, you could even create a view for the sub-query. Then the final SQL is quite concise.

Here is a fiddle.

Alternative

Using the same principle, but using the SUM in the HAVING clause, you can avoid gathering all skill's experiences:

SELECT     e.*
FROM       Employee e
INNER JOIN (
        SELECT     es.eid
        FROM       Employee_Skills es
        INNER JOIN Skills s ON  es.sid = s.id
        GROUP BY   es.eid
        HAVING     SUM(CASE s.skill WHEN 'PHP' THEN s.experience END) > 2
        AND        SUM(CASE s.skill WHEN 'SQL' THEN s.experience END) > 0
        ) pivot ON pivot.eid = e.id;

Here is a fiddle.

You can also replace the CASE construct by the IF function, like this:

        HAVING     SUM(IF(s.skill='PHP', s.experience, 0)) > 2
        ... etc.

But it comes down to the same.

Upvotes: 0

Iłya Bursov

Reputation: 24146

you can try next approach:

select * from Employees
where id in (
    select eid
    from Employee_Skills as a
    inner join
    Skills as ski
    on (a.sid = ski.id)
    where
    (ski.skill = 'PHP' AND a.experience > 2) OR
    (ski.skill = 'SQL' AND a.experience > 1)
    group by eid
    having count(*) = 2
)

so, for every filter you will add OR statement, having will filter employees with all filters passed, just pass appropriate number

Upvotes: 1

Relational division SQL

Answers (3)

Related Questions