Reputation: 40140
You might want to have a look at my previous question.
My database schema looks like this
--------------- ---------------
| candidate 1 | | candidate 2 |
--------------- \ --------------
/ \ |
------- -------- etc
|job 1| | job 2 |
------- ---------
/ \ / \
--------- --------- --------- --------
|company | | skills | |company | | skills |
--------- --------- ---------- ----------
Here's my database:
mysql> describe jobs;
+--------------+---------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------+---------+------+-----+---------+----------------+
| job_id | int(11) | NO | PRI | NULL | auto_increment |
| candidate_id | int(11) | NO | MUL | NULL | |
| company_id | int(11) | NO | MUL | NULL | |
| start_date | date | NO | MUL | NULL | |
| end_date | date | NO | MUL | NULL | |
+--------------+---------+------+-----+---------+----------------+
.
mysql> describe candidates;
+----------------+----------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------------+----------+------+-----+---------+----------------+
| candidate_id | int(11) | NO | PRI | NULL | auto_increment |
| candidate_name | char(50) | NO | MUL | NULL | |
| home_city | char(50) | NO | MUL | NULL | |
+----------------+----------+------+-----+---------+----------------+
.
mysql> describe companies;
+-------------------+---------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------------+---------------+------+-----+---------+----------------+
| company_id | int(11) | NO | PRI | NULL | auto_increment |
| company_name | char(50) | NO | MUL | NULL | |
| company_city | char(50) | NO | MUL | NULL | |
| company_post_code | char(50) | NO | | NULL | |
| latitude | decimal(11,8) | NO | | NULL | |
| longitude | decimal(11,8) | NO | | NULL | |
+-------------------+---------------+------+-----+---------+----------------+
.
Note that I should probably call this skill_usage
, as it indicates when a skill was use don a job.
mysql> describe skills;
+----------+---------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------+---------+------+-----+---------+-------+
| skill_id | int(11) | NO | MUL | NULL | |
| job_id | int(11) | NO | MUL | NULL | |
+----------+---------+------+-----+---------+-------+
.
mysql> describe skill_names;
+------------+----------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+----------+------+-----+---------+----------------+
| skill_id | int(11) | NO | PRI | NULL | auto_increment |
| skill_name | char(32) | NO | MUL | NULL | |
+------------+----------+------+-----+---------+----------------+
So far, my MySQL query looks like this:
SELECT DISTINCT can.candidate_id,
can.candidate_name,
can.candidate_city,
j.job_id,
j.company_id,
DATE_FORMAT(j.start_date, "%b %Y") AS start_date,
DATE_FORMAT(j.end_date, "%b %Y") AS end_date,
s.skill_id
FROM candidates AS can
INNER JOIN jobs AS j ON j.candidate_id = can.candidate_id
INNER JOIN companies AS co ON j.company_id = co.company_id
INNER JOIN skills AS s ON s.job_id = j.job_id
INNER JOIN skill_names AS sn ON s.skill_id = s.skill_id
AND sn.skill_id = s.skill_id
ORDER by can.candidate_id, j.job_id
I am getting output like this, but am not satisfied with it
+--------------+----------------+---------------------+--------+------------+------------+------------+----------+
| candidate_id | candidate_name | candidate_city | job_id | company_id | start_date | end_date | skill_id |
+--------------+----------------+---------------------+--------+------------+------------+------------+----------+
| 1 | Pamela Brown | Cardiff | 1 | 3 | 2019-01-01 | 2019-08-31 | 1 |
| 1 | Pamela Brown | Cardiff | 1 | 3 | 2019-01-01 | 2019-08-31 | 2 |
| 1 | Pamela Brown | Cardiff | 1 | 3 | 2019-01-01 | 2019-08-31 | 1 |
| 1 | Pamela Brown | Cardiff | 2 | 2 | 2018-06-01 | 2019-01-31 | 3 |
| 1 | Pamela Brown | Cardiff | 3 | 1 | 2017-11-01 | 2018-06-30 | 4 |
| 1 | Pamela Brown | Cardiff | 3 | 1 | 2017-11-01 | 2018-06-30 | 5 |
| 1 | Pamela Brown | Cardiff | 3 | 1 | 2017-11-01 | 2018-06-30 | 6 |
| 1 | Pamela Brown | Cardiff | 4 | 3 | 2016-08-01 | 2017-11-30 | 1 |
| 2 | Christine Hill | Salisbury | 5 | 2 | 2018-02-01 | 2019-05-31 | 3 |
Now, I would like to restrict the search, by specifying "skill", like Python, C, C++, UML, etc and company names
The user will enter something like Python AND C++
into a skill search box (and/or Microsoft OR Google
into a company name search box).
How do I feed that into my query? Please bear in mind that each skill ID has a job Id associated with it. Maybe I first need to convert the skill names from the search (in this case Python C++
) into skill Ids? Even so, how do I include that in my query?
Te make a few things clearer:
It looks like I made a start, with that INNER JOIN skills AS s ON s.job_id = j.job_id
, which I think will handle a search for a single skill, given its ... name ? ... Id?
I suppose my question would be how would that query look if, for example, I wanted to restrict the results to anyone who had worked at Microsoft OR Google
and has the skills Python AND C++
?
If I get an example for that, I can extrapolate, but, at this point, I am unsure whether I want more INNER JOINs or WHERE clauses.
I think that I want to extend that second last line AND sn.skill_id = s.skill_id
by paring the skills search string, in my example Python AND C++
and generating some SQL along the lines of AND (s.skill_id = X )
, where X is the skill Id for Python, BUT I don't know how to handle Python AND C++
, or something more complex, like Python AND (C OR C++)
...
Just to be clear, the users are technical and expect to be able to enter complex searches. E.g for skills: (C AND kernel)OR (C++ AND realtime) OR (Doors AND (UML OR QT))
.
The requirements just changed. The person that I am coding this for just told me that if a candidate matches the skill search on any job that he ever worked, then I ought to return ALL jobs for that candidate.
That sounds counter-intuitive to me, but he swears that that is what he wants. I am not sure it can even be done in a single query (I am considering multiple queries; a first t get the candidates with matching skills, then a second to get all of their jobs).
Upvotes: 1
Views: 319
Reputation: 139
Building a bit off previous comments and answers... if handling input like
(A and (B OR C)) OR (D AND (E OR F))
is the blocker you could try moving some of the conditional logic out of the joins and filter instead.
WHERE (
((sn.skill_id LIKE 'A') AND ((sn.skill_id LIKE ('B')) OR (sn.skill_id LIKE('C'))))
AND ((co.company_id IN (1,2,3)) AND ((can.city = 'Springfield') OR (j.city LIKE('Mordor'))))
)
You can build your query string based off used input, search out Id's for selected values and put them into the string and conditionally build as many filters as you like. Think about setting up add_and_filter and add_or_filter functions to construct the <db>.<field> <CONDITION> <VALUE>
statements.
$qs = "";
$qs .= "select val from table";
...
$qs .= " WHERE ";
if($userinput){ $qs += add_and_filter($userinput); }
alternately, look at a map/reduce pattern rather than trying to do it all in SQL?
Upvotes: 1
Reputation: 29619
The first thing I'd say is that your original query probably needs an outer join on the skills table - as it stands, it only retrieves people whose job has a skill (which may not be all jobs). You say that "both the skills & company search box can be empty, which I will interpret as return everything" - this version of the query will not return everything.
Secondly, I'd rename your "skills" table to "job_skills", and your "skill_names" to "skills" - it's more consistent (your companies table is not called company_names).
The query you show has a duplication - AND sn.skill_id = s.skill_id
duplicates the terms of your join. Is that intentional?
To answer your question: I would present the skills to your users in some kind of pre-defined list in your PHP, associated with a skill_id. You could have all skills listed with check boxes, or allow the user to start typing and use AJAX to search for skills matching the text. This solves a UI problem (what if the user tries to search for a skill that doesn't exist?), and makes the SQL slightly easier.
Your query then becomes:
SELECT DISTINCT can.candidate_id,
can.candidate_name,
can.candidate_city,
j.job_id,
j.company_id,
DATE_FORMAT(j.start_date, "%b %Y") AS start_date,
DATE_FORMAT(j.end_date, "%b %Y") AS end_date,
s.skill_id
FROM candidates AS can
INNER JOIN jobs AS j ON j.candidate_id = can.candidate_id
INNER JOIN companies AS co ON j.company_id = co.company_id
INNER JOIN skills AS s ON s.job_id = j.job_id
INNER JOIN skill_names AS sn ON s.skill_id = s.skill_id
AND skill_id in (?, ?, ?)
OR skill_id in (?)
ORDER by can.candidate_id, j.job_id
You need to substitute the question marks for the input your users have entered. EDIT
The problem with allowing users to enter the skills as free text is that you then have to deal with case conversion, white space and typos. For instance, is "python " the same as "Python"? Your user probably intends it to be, but you can't do a simple comparison with skill_name
. If you want to allow free text, one solution might be to add a "normalized" skill_name column in which you store the name in a consistent format (e.g. "all upper case, stripped of whitespace"), and you normalize your input values in the same way, then compare to that normalized column. In that case, the "in clause" becomes something like:
AND skill_id in (select skill_id from skill_name where skill_name_normalized in (?, ?, ?))
The boolean logic you mention - (C OR C++) AND (Agile) - gets pretty tricky. You end up writing a "visual query builder". You may want to Google this term - there are some good examples.
You've narrowed down your requirements somewhat (I may misunderstand). I believe your requirements are
I want to be able to specify zero or more filters.
A filter consists of one or more ANDed skill groups.
A skill group consists of one or more skills.
Filters are ORed together to create a query.
To make this concrete, let's use your example - (A and (B OR C)) OR (D AND (E OR F))
. There are two filters: (A and (B OR C))
and (D AND (E OR F))
. The first filter has two skill groups: A
and (B OR C)
.
It's hard to explain the suggestion in text, but you could create a UI that allows users to specify individual "filters". Each "filter" would allow the user to specify one or more "in clauses", joined with an "and". You could then convert this into SQL - again, using your example, the SQL query becomes
SELECT DISTINCT can.candidate_id,
can.candidate_name,
can.candidate_city,
j.job_id,
j.company_id,
DATE_FORMAT(j.start_date, "%b %Y") AS start_date,
DATE_FORMAT(j.end_date, "%b %Y") AS end_date,
s.skill_id
FROM candidates AS can
INNER JOIN jobs AS j ON j.candidate_id = can.candidate_id
INNER JOIN companies AS co ON j.company_id = co.company_id
INNER JOIN skills AS s ON s.job_id = j.job_id
INNER JOIN skill_names AS sn ON s.skill_id = s.skill_id
AND
(skill_id in (A) and skil_id in (B, C))
OR
(skill_id in (D) and skil_id in (E, F))
ORDER by can.candidate_id, j.job_id
Upvotes: 3