Mike Pham
Mike Pham

Reputation: 497

Indexing does not improve performance and sometimes makes it worse

I'm having a table EMPLOYEE(EMP_ID, EMP_NAME, DESC, SALARY) and I want to retrieve all employees that can do magic tricks. Being able to perform magic is classified as a talent and the talent if each employee is stored at the end of each DESC entry of each employee.

If an employee has any talent, then the DESC of that employee will end with 'Talent: __talent_here__'.For example, Joe is good at juggling, then his description would be something like 'Joe Doe, works here since 2011. Talent: juggling'. I can assume that in every description there is only one instance of the string literal 'Talent: ', the talents are always stored at the end of the DESC and not all employees have talents. I also want to sort the employees retrieved by their names and I only want the first 50 sorted values

This is the SELECT statement I'm currently having right now.

SELECT * FROM (
    SELECT E.EMP_NAME FROM EMP E
    WHERE INSTR(E.DESC, 'Talent: ') > 0
        AND INSTR(SUBSTR(E.DESCRIPTION, INSTR(E.DESC, 'Talent: ')), 'magic') > 0
    ORDER BY E.EMP_NAME ASC
) WHERE ROWNUM <= 50
/

This SELECT statement works fine. Now I want to create an index of on the query to improve running time:

CREATE INDEX MAGICIANS ON EMP(
    INSTR(DESCRIPTION, 'Talent: '), 
    INSTR(SUBSTR(DESCRIPTION, INSTR(DESCRIPTION, 'Talent: ')), 'magic'))
/

But this index doesn't seem to work. I'm having about 9000 entries in my table and before creating the index, I the runtime of my select statement is 00.50 seconds. After creating the index, it becomes around 00.49 seconds. I've been experimenting with the index and sometimes the runtime after creating the index is even slightly worse than it was before. (from 00.50 to 00.52 seconds)

Does anyone know why this happens?

Thanks everyone.

Upvotes: 2

Views: 979

Answers (3)

Ronnis
Ronnis

Reputation: 12833

APC has it right. The data model is flawed. Having said that, here are two solutions.

Solution 1. Create a function-based index, extracting the actual talent. This makes it possible to search for juggler and clown as well. There are two downsides to this approach. In order for Oracle to use that index, you'll have to repeat the expression in your queries to exactly match the index definition. The second downside is that you can't easily extract the actual talent unless you again repeat the expression in the select list.

create index employee_ix_talent 
    on employee(regexp_replace(descr, '^(.*)Talent: (.*)$', '\2'));

select emp_id, emp_name, salary, regexp_replace(descr, '^(.*)Talent: (.*)$', '\2') as talent
  from employee e
 where regexp_replace(descr, '^(.*)Talent: (.*)$', '\2') = 'magic';

Solution 2 (preferred). Add a virtual column to the table and index that column. This gets rid of both downsides from the first solution. You can now select the talent column and search against it and most importantly: Your code is now shielded against changes in the expression.

alter table employee 
  add talent generated always as(
         regexp_replace(descr, '^(.*)Talent: (.*)$', '\2')
      );

create index employee_ix_talent on employee(talent);

select emp_id, emp_name, talent, salary
  from employee e
 where talent = 'magic';

Upvotes: 0

APC
APC

Reputation: 146199

Indexes are not magic (oh ho!). Building an index does not guarantee faster retrieval times. To understand why your index doesn't provide quicker results you need to understand how the index works.

Your index is arranged by the offset of the word 'Talent'. All the employees without a talent have an offset of zero, all the others have a bigger number. For them, the offset is a function of the length of their value of DESCRIPTION. Crucially there is (presumably) no relationship between the length of DESCRIPTION and whether the employee can do magic.

Consequently, to find employees who can do magic the database has to visit all index entries where the talent offset is not zero, then read all the index entries where the magic offset is not zero, then read the table records for those entries. The index entries for magicians will be scattered across the index, and those records will be scattered across the table. Almost certainly, it's faster to do a full table scan and pull out the records from the table.

The other thing is, even if your index worked for magicians it would be useless for finding jugglers.

Fundamentally this is a failure of the data model: embedding TALENT in DESCRIPTION breaks First Normal Form. Why it is possible to work around it - with Virtual Columns (or Materialized Views like @TimBiegeisen suggested) - the correct solution would be to model TALENT as a lookup table and have EMPLOYEE reference it through a foreign key (thus eliminating the possibility of employees practising 'magick' or 'Majik'). Such a lookup table would also allow you to have multi-talented employees: you could model employees who were both magicians and jugglers with an intersection table EMPLOYEE_TALENT to join the EMPLOYEE and TALENT tables.

Upvotes: 5

Tim Biegeleisen
Tim Biegeleisen

Reputation: 520928

The problem with your index approach is that an B-Tree index is build around the start of the string in a column, not around a substring buried somewhere in the middle. In order to use an index, ideally there would be a separate bona fide column containing the employee's talent. One approach here would be to create a materialized view containing such a talent column, and then index it:

CREATE MATERIALIZED VIEW mv_name
REFRESH ON DEMAND
AS
SELECT e.*,
    CASE WHEN INSTR(DESCRIPTION, 'Talent: ') > 0
         THEN SUBSTR(DESCRIPTION, INSTR(DESCRIPTION, 'Talent: ') + 8)
     ELSE '' END AS Talent
FROM EMPLOYEE e;

CREATE INDEX idx ON mv_name (Talent);

Now, the following query should be reasonably fast:

SELECT *
FROM mv_name
WHERE Talent = 'magic';

Upvotes: 3

Related Questions