oracle composite index column positions and non-composite index

Question

The customer table contains 9.5 million records. The customer_id column is the primary key. The database is oracle.

Questions:

To improve performance, should the composite key contain the customer_id column also?
For eg: The composite key contains (occupation_type in column position 1 and customer_id in column position 2). Is this useful? The customer_id is already indexed as part of primary key.
in what column position the customer_id be included, first or last, what is the basis of doing so? Suppose the composite key contains 3 columns, in which position should the customer_id column be?
what happens if the query contains the where clause ONLY for a column which comes in column position 3 of composite index? is the composite index useful then?
if the scenario is to run queries such as (in no particular order), should the indexes be built independent or in a composite way. Which is helpful in the scenario below?

Query1: where tableA.columnA = value
Query2: where tableA.columnB = value
Query3: where tableA.columnC = value

Srini V · Accepted Answer

This is the data setup I did for your case.

CREATE TABLE TESTING
(
   COLUMN1   NUMBER ( 10 ) PRIMARY KEY,
   COLUMN2   CHAR ( 12 ),
   COLUMN3   VARCHAR2 ( 12 ),
   COLUMN4   VARCHAR2 ( 12 ),
   COLUMN5   VARCHAR2 ( 12 )
);

INSERT INTO TESTING
   (SELECT ROWNUM,
           DBMS_RANDOM.STRING ( 'U', 5 ),
           DBMS_RANDOM.STRING ( 'U', 10 ),
           DBMS_RANDOM.STRING ( 'L', 10 ),
           DBMS_RANDOM.STRING ( 'L', 10 )
    FROM   ALL_OBJECTS
    WHERE  ROWNUM <= 50000 );

Q1 : A1 Completely depends on what you want. Since you have mentioned. "To improve performance", we can't provide any opinion. But in general for a composite index, it is always stored, sorted in key order. When you select some thing which is a part of any index (NOT NULL), then in that case, the FAST FULL SCAN is used.

SET AUTOTRACE ON

SELECT  COUNT(DISTINCT COLUMN1) FROM TESTING;

EXECUTION PLAN
----------------------------------------------------------
   0       SELECT STATEMENT OPTIMIZER MODE=ALL_ROWS (COST=29 CARD=1 BYTES=13)
   1    0    SORT AGGREGATE (CARD=1 BYTES=13)
   2    1      VIEW SYS.VW_DAG_0 (COST=29 CARD=40 K BYTES=517 K)
   3    2        HASH GROUP BY (COST=29 CARD=40 K BYTES=517 K)
   4    3          INDEX FAST FULL SCAN SYS_C0016669 (COST=27 CARD=40 K BYTES=517 K)

In case the where clause contains the indexed column then the Index Range Scan takes place

SET AUTOTRACE ON

SELECT  COUNT(DISTINCT COLUMN1) FROM TESTING WHERE COLUMN1 < 10;

EXECUTION PLAN
----------------------------------------------------------
   0       SELECT STATEMENT OPTIMIZER MODE=ALL_ROWS (COST=2 CARD=1 BYTES=13)
   1    0    SORT AGGREGATE (CARD=1 BYTES=13)
   2    1      VIEW SYS.VW_DAG_0 (COST=2 CARD=5 BYTES=65)
   3    2        SORT GROUP BY NOSORT (COST=2 CARD=5 BYTES=65)
   4    3          INDEX RANGE SCAN SYS_C0016669 (COST=2 CARD=5 BYTES=65)

Let me create a composite index

CREATE INDEX IDX_TEST_1 ON TESTING(COLUMN2, COLUMN3);
CREATE INDEX IDX_TEST_2 ON TESTING(COLUMN1, COLUMN2);

This has no impact unless you use them in your predicates. So the above statement will not change the plan after the index

SET AUTOTRACE ON

SELECT  COUNT(DISTINCT COLUMN1) FROM TESTING WHERE COLUMN1 < 10;


EXECUTION PLAN
----------------------------------------------------------
   0       SELECT STATEMENT OPTIMIZER MODE=ALL_ROWS (COST=2 CARD=1 BYTES=13)
   1    0    SORT AGGREGATE (CARD=1 BYTES=13)
   2    1      VIEW SYS.VW_DAG_0 (COST=2 CARD=5 BYTES=65)
   3    2        SORT GROUP BY NOSORT (COST=2 CARD=5 BYTES=65)
   4    3          INDEX RANGE SCAN SYS_C0016669 (COST=2 CARD=5 BYTES=65)

Q2 : A2. If CUSTOMER_ID will be used in the where clause aling with another column frequently then in that case, go for the composite index. The first column in that should be the one that is sorted. So in your case it should be the one with the index

an index on (x,y) would have a low clustering factor. (Given that the data is sorted by x) an index on (y,x) would have a high clustering factor (Given that the data is not sorted by y)

AskTom said once

An INDEX_FFS (index fast full scan) is a process where by we read the index data as if it were a table. Normally, an index is processed a block at a time. Go to the root block, use the branch blocks to navigate. It is a data structure.

In a FAST FULL SCAN, we just read the entire structure as it exists on disk. We do not treat it as an index, but more like a table. We read it out of order (data does not come back sorted from an index ffs) and we read it using multiblock IO.

This allows us to use an index like a "skinnier" version of the table.

We can:

range scan and index (ascending or descending) combine indexes (what we do with bit mapped indexes to process complex AND/OR conditions) join indexes (use two or more indexes joined together) full scan indexes as above.

You should always Index a table on how you retrieve the data.

Q3 : A3

SET AUTOTRACE ON
SELECT  COUNT(DISTINCT COLUMN1) FROM TESTING WHERE COLUMN3 = 'ASASDFF';

EXECUTION PLAN
----------------------------------------------------------
   0       SELECT STATEMENT OPTIMIZER MODE=ALL_ROWS (COST=138 CARD=1 BYTES=13)
   1    0    SORT AGGREGATE (CARD=1 BYTES=13)
   2    1      VIEW SYS.VW_DAG_0 (COST=138 CARD=5 BYTES=65)
   3    2        HASH GROUP BY (COST=138 CARD=5 BYTES=105)
   4    3          TABLE ACCESS FULL TESTING (COST=137 CARD=5 BYTES=105)

Here though we have a composite index comprising the COLUMN3, it is useless

Hence for Q4 : A4, It is always better to have individual ones than that of having a composite one

oracle composite index column positions and non-composite index

Answers (2)

Related Questions