Should relational tables contain duplicate data to speed up queries

Question

I have a MySQL database with 4 tables:

job
job_application
client
candidate

Each table has it's own primary key, i.e job_id, job_application_id, client_id, candidate_id

Employers in the client table can post jobs in the job table. The job table contains a client_id field which identifies the client

Candidates in the candidate table can apply for a job, inserting a row in to the job_application table. The job_application table contains a job_id field and a candidate_id field to identify what the job is and who applied for it

I've run in to a bit of a problem writing up the queries for Employers to manage the job applications they receive. As an example here is a function I wrote that deletes rows from job_application

public function deleteJobApplications($job_application_ids) {
    $this->db->query("DELETE ja.* FROM " . DB_PREFIX . "job_application ja LEFT JOIN " . DB_PREFIX . "job j ON (j.job_id = ja.job_id) WHERE ja.job_application_id IN ('" . implode("','", array_map('intval', $job_application_ids)) . "') AND j.client_id = '" . (int)$this->client->getClientId() . "'");
}

Because the client_id is only referenced in the job table, I need to LEFT JOIN the job table every time I want to UPDATE or DELETE from the job_application table

Should I add another client_id field to the job_application table, essentially duplicating data already held in the database, or continue with the LEFT JOIN for every UPDATE and DELETE?

Mike Sherrill &#39;Cat Recall&#39; · Accepted Answer

Your problem isn't that you need to denormalize "job_applications" by introducing the "client_id" as a redundant column. (The currently accepted answer is factually incorrect in that regard.) Your problem is that you didn't normalize correctly in the first place. If you had, the column "client_id" would already be in that table, and your problem would never have arisen in the first place.

Let's pretend that candidate names, client names, and job names are globally unique.

A table that looks like this will satisfy the predicate Person named "candidate_name" applies for "job_name" at company "client_name".

job_applicatons
Person named  applies for  at company .

client_name  job_name                candidate_name  
--
Microsoft    C++ programmer, Excel   Ed Wood 
Microsoft    C++ programmer, Excel   Dane Crute 
Microsoft    C++ programmer, Excel   Vim Winder
Microsoft    C++ programmer, Word    Wil Krug
Microsoft    C++ programmer, Word    Val Stein
Google       Python coder, search    Ed Wood
Google       Programmer, compilers   Ed Wood
Google       Programmer, compilers   Val Stein

Three columns, no id numbers, no nulls, no nonprime attributes, all key. This relation is in 6NF.

It should be obvious that you could create a table for jobs (or job offers) by selecting distinct values from the first two columns. The foreign key reference is obvious.

jobs
Company named  offers .

client_name  job_name
--
Microsoft    C++ programmer, Excel
Microsoft    C++ programmer, Word
Google       Python coder, search
Google       Programmer, compilers

In a similar way, you can select distinct values from the first column alone for a set of companies, and from the last column alone for a set of applicants. Again, the foreign key references should be obvious.

clients
Company named  is a client.

client_name
--
Microsoft
Google

candidates
Person named  is looking for a job.

candidate_name  
--
Ed Wood 
Dane Crute 
Vim Winder
Wil Krug
Val Stein

All those tables are in 6NF.

Augmenting a table with a surrogate key in addition to its natural keys doesn't change the normal form when you do it correctly. Let's replace the natural keys in "job_applications" with your surrogate ID numbers. Making that replacement will result in your table looking like this. (In practice, you'd do the same thing in the other tables, too.)

job_applications
--
client_id
job_id
candidate_id
primary key (client_id, job_id, candidate_id)
other columns go here...

Note that client_id is already in there. If there are no other columns, you're still in at least 5NF.

Should relational tables contain duplicate data to speed up queries

Answers (2)

Related Questions