Reputation: 1607
I recently took a simple skills test to which I was given the feedback:
"There is one small indexing optimisation which could improve performance."
The skills test involved creating a birthday e-card online app; users sign up, then on their birthday an email is sent to them. I was to presume this is on a Linux server running a mysql database with around 4 million records.
I've tried my best to research further issues with indexing on my database, but to my best research and knowledge, im struggling to find any improvements. I'd really appreciate any pointers here so I can learn where I went wrong;
Database:
CREATE TABLE `birthdayCard`
(
`Email` VARCHAR(255),
`FirstName` CHAR(30),
`LastName` CHAR(30),
`Dob` DATE,
PRIMARY KEY (Email),
INDEX(Dob)
);
Query:
SELECT * FROM `birthdayCard`
WHERE MONTH(Dob) = MONTH(NOW())
AND DAY(Dob) = DAY(NOW());
Upvotes: 2
Views: 304
Reputation: 1607
I managed to receive some feedback from my test directly from the company, and as their response hasn't been shared so far, I thought I'd share that too as an option.
The problem comes, as highlighted by most, with the DOB. From what was explained to me, when querying a DOB stored as a date as I have done, a query looking for the day and month performs the query similar to a LIKE statement.
This effectually means the stored value 1970-01-01 (the format dates are stored) would be queried similar to:
WHERE Dob LIKE '%01-01'
This would mean the MYSQL engine would cycle through the unneeded "1970-" part of the value.
The proposed solution would then be to only store (and index) the part of the date needed (month, day). A 4 character length integer would be perfect for this, especially if we performed a query which would select from the left using LEFT, and the RIGHT SELECT function.
Table:
CREATE TABLE `birthdayCard`
(
`Email` VARCHAR(255),
`FirstName` CHAR(30),
`LastName` CHAR(30),
`Dob` INT(4),
PRIMARY KEY (Email),
INDEX(Dob)
);
Query:
SELECT * FROM `birthdayCard`
WHERE LEFT(Dob, 2) = MONTH(NOW())
AND RIGHT(Dob, 2) = DAY(NOW());
It's not that the other methods won't work, or my example was wrong, but speed wise - this proposed method seems to me at least, to be the fastest. In case you're interested; this solution was provided by a hardy SQL veteran and CEO with some 20 programming years behind him.
Upvotes: 0
Reputation: 52000
As explained in the comment above, the INDEX(Dob)
is not used -- since this is an index on year-month-day. You have to create an index on month-day.
Probably not the most elegant solution, but:
CREATE TABLE `birthdayCard`(`Email` VARCHAR(255), `FirstName` CHAR(30), `LastName` CHAR(30),
`Mob` int, `Dob` int,
PRIMARY KEY (Email), INDEX(`Mob`, `Dob`));
See http://sqlfiddle.com/#!2/db82ff/1
For a better( ?) answer: as MySQL does not support computed columns, you might need triggers to populate a "month-day" columns, and have an index on it:
CREATE TABLE `birthdayCard`(`Email` VARCHAR(255), `FirstName` CHAR(30), `LastName` CHAR(30),
`Dob` DATE,
`Birthday` CHAR(5),
PRIMARY KEY (Email), INDEX(`Birthday`));
CREATE TRIGGER ins_bithdayCard BEFORE INSERT ON `birthdayCard`
FOR EACH ROW
SET NEW.`birthday` = DATE_FORMAT(NEW.`Dob`, "%m%d");
CREATE TRIGGER upd_bithdayCard BEFORE UPDATE ON `birthdayCard`
FOR EACH ROW
SET NEW.`birthday` = DATE_FORMAT(NEW.`Dob`, "%m%d");
This allow "simple" inserts, preserving if needed the full Dob
as in your original example:
insert into birthdayCard (Email, FirstName, LastNAme, Dob)
values ("[email protected]", "Sylvain", "Leroux", '2013-08-05');
The SELECT
query has to be modified to use the new "search" column:
SELECT * FROM `birthdayCard` WHERE Birthday = DATE_FORMAT(NOW(), "%m%d");
Sett http://sqlfiddle.com/#!2/66111/3
Upvotes: 2
Reputation: 52107
I don't know about a "small" improvement, but I can think of a big one...
The index can only be used on "naked" fields, so your current query causes an expensive full table scan. You should transform the WHERE expression so the field is not enclosed by the function call:
SELECT * FROM `birthdayCard`
WHERE
Dob >= CURDATE()
AND Dob < DATE_ADD(CURDATE(), INTERVAL 1 DAY);
Which can be satisfied by an index range scan:
ID SELECT_TYPE TABLE TYPE POSSIBLE_KEYS KEY KEY_LEN REF ROWS EXTRA
1 SIMPLE birthdayCard range Dob Dob 4 (null) 1 Using where
Upvotes: 0
Reputation: 18550
A few options I would consider.
Creating a column and selecting date_diff(dob
, interval - year(dob
) YEARS)
This gives a date of 0000-08-04 which you can query easily. You can use an trigger to keep new columns in sync.
instead of using a date type. use a char(10). When the column try has been changed update the column to be REVERSE(dob
). You can then query for the day and month pretty quick while keeping it in 1 column and keeping the year. This has the advantage of keeping 1 column and all the information
Using some maths - though no method are springing to mind. Im sure there are some
Upvotes: -1