Reputation: 497
I'm having a table EMPLOYEE(EMP_ID, EMP_NAME, DESC, SALARY)
and I want to retrieve all employees that can do magic tricks. Being able to perform magic is classified as a talent and the talent if each employee is stored at the end of each DESC
entry of each employee.
If an employee has any talent, then the DESC
of that employee will end with 'Talent: __talent_here__'
.For example, Joe is good at juggling, then his description would be something like 'Joe Doe, works here since 2011. Talent: juggling'
. I can assume that in every description there is only one instance of the string literal 'Talent: '
, the talents are always stored at the end of the DESC
and not all employees have talents. I also want to sort the employees retrieved by their names and I only want the first 50 sorted values
This is the SELECT
statement I'm currently having right now.
SELECT * FROM (
SELECT E.EMP_NAME FROM EMP E
WHERE INSTR(E.DESC, 'Talent: ') > 0
AND INSTR(SUBSTR(E.DESCRIPTION, INSTR(E.DESC, 'Talent: ')), 'magic') > 0
ORDER BY E.EMP_NAME ASC
) WHERE ROWNUM <= 50
/
This SELECT
statement works fine. Now I want to create an index of on the query to improve running time:
CREATE INDEX MAGICIANS ON EMP(
INSTR(DESCRIPTION, 'Talent: '),
INSTR(SUBSTR(DESCRIPTION, INSTR(DESCRIPTION, 'Talent: ')), 'magic'))
/
But this index doesn't seem to work. I'm having about 9000 entries in my table and before creating the index, I the runtime of my select statement is 00.50 seconds. After creating the index, it becomes around 00.49 seconds. I've been experimenting with the index and sometimes the runtime after creating the index is even slightly worse than it was before. (from 00.50 to 00.52 seconds)
Does anyone know why this happens?
Thanks everyone.
Upvotes: 2
Views: 979
Reputation: 12833
APC has it right. The data model is flawed. Having said that, here are two solutions.
Solution 1. Create a function-based index, extracting the actual talent. This makes it possible to search for juggler and clown as well. There are two downsides to this approach. In order for Oracle to use that index, you'll have to repeat the expression in your queries to exactly match the index definition. The second downside is that you can't easily extract the actual talent unless you again repeat the expression in the select list.
create index employee_ix_talent
on employee(regexp_replace(descr, '^(.*)Talent: (.*)$', '\2'));
select emp_id, emp_name, salary, regexp_replace(descr, '^(.*)Talent: (.*)$', '\2') as talent
from employee e
where regexp_replace(descr, '^(.*)Talent: (.*)$', '\2') = 'magic';
Solution 2 (preferred). Add a virtual column to the table and index that column. This gets rid of both downsides from the first solution. You can now select the talent column and search against it and most importantly: Your code is now shielded against changes in the expression.
alter table employee
add talent generated always as(
regexp_replace(descr, '^(.*)Talent: (.*)$', '\2')
);
create index employee_ix_talent on employee(talent);
select emp_id, emp_name, talent, salary
from employee e
where talent = 'magic';
Upvotes: 0
Reputation: 146199
Indexes are not magic (oh ho!). Building an index does not guarantee faster retrieval times. To understand why your index doesn't provide quicker results you need to understand how the index works.
Your index is arranged by the offset of the word 'Talent'. All the employees without a talent have an offset of zero, all the others have a bigger number. For them, the offset is a function of the length of their value of DESCRIPTION
. Crucially there is (presumably) no relationship between the length of DESCRIPTION
and whether the employee can do magic.
Consequently, to find employees who can do magic the database has to visit all index entries where the talent
offset is not zero, then read all the index entries where the magic
offset is not zero, then read the table records for those entries. The index entries for magicians will be scattered across the index, and those records will be scattered across the table. Almost certainly, it's faster to do a full table scan and pull out the records from the table.
The other thing is, even if your index worked for magicians it would be useless for finding jugglers.
Fundamentally this is a failure of the data model: embedding TALENT
in DESCRIPTION
breaks First Normal Form. Why it is possible to work around it - with Virtual Columns (or Materialized Views like @TimBiegeisen suggested) - the correct solution would be to model TALENT
as a lookup table and have EMPLOYEE
reference it through a foreign key (thus eliminating the possibility of employees practising 'magick' or 'Majik'). Such a lookup table would also allow you to have multi-talented employees: you could model employees who were both magicians and jugglers with an intersection table EMPLOYEE_TALENT
to join the EMPLOYEE
and TALENT
tables.
Upvotes: 5
Reputation: 520928
The problem with your index approach is that an B-Tree index is build around the start of the string in a column, not around a substring buried somewhere in the middle. In order to use an index, ideally there would be a separate bona fide column containing the employee's talent. One approach here would be to create a materialized view containing such a talent column, and then index it:
CREATE MATERIALIZED VIEW mv_name
REFRESH ON DEMAND
AS
SELECT e.*,
CASE WHEN INSTR(DESCRIPTION, 'Talent: ') > 0
THEN SUBSTR(DESCRIPTION, INSTR(DESCRIPTION, 'Talent: ') + 8)
ELSE '' END AS Talent
FROM EMPLOYEE e;
CREATE INDEX idx ON mv_name (Talent);
Now, the following query should be reasonably fast:
SELECT *
FROM mv_name
WHERE Talent = 'magic';
Upvotes: 3