Reputation: 65
The customer table contains 9.5 million records. The customer_id
column is the primary key. The database is oracle.
Questions:
To improve performance, should the composite key contain the customer_id
column also?
For eg: The composite key contains (occupation_type
in column position 1 and customer_id
in column position 2). Is this useful? The customer_id
is already indexed as part of primary key.
in what column position the customer_id
be included, first or last, what is the basis of doing so? Suppose the composite key contains 3 columns, in which position should the customer_id
column be?
what happens if the query contains the where clause ONLY for a column which comes in column position 3 of composite index? is the composite index useful then?
if the scenario is to run queries such as (in no particular order), should the indexes be built independent or in a composite way. Which is helpful in the scenario below?
Query1: where tableA.columnA = value
Query2: where tableA.columnB = value
Query3: where tableA.columnC = value
Upvotes: 0
Views: 1042
Reputation: 11365
This is the data setup I did for your case.
CREATE TABLE TESTING
(
COLUMN1 NUMBER ( 10 ) PRIMARY KEY,
COLUMN2 CHAR ( 12 ),
COLUMN3 VARCHAR2 ( 12 ),
COLUMN4 VARCHAR2 ( 12 ),
COLUMN5 VARCHAR2 ( 12 )
);
INSERT INTO TESTING
(SELECT ROWNUM,
DBMS_RANDOM.STRING ( 'U', 5 ),
DBMS_RANDOM.STRING ( 'U', 10 ),
DBMS_RANDOM.STRING ( 'L', 10 ),
DBMS_RANDOM.STRING ( 'L', 10 )
FROM ALL_OBJECTS
WHERE ROWNUM <= 50000 );
Q1 : A1 Completely depends on what you want. Since you have mentioned. "To improve performance"
, we can't provide any opinion. But in general for a composite index, it is always stored, sorted in key order. When you select some thing which is a part of any index (NOT NULL), then in that case, the FAST FULL SCAN
is used.
SET AUTOTRACE ON
SELECT COUNT(DISTINCT COLUMN1) FROM TESTING;
EXECUTION PLAN
----------------------------------------------------------
0 SELECT STATEMENT OPTIMIZER MODE=ALL_ROWS (COST=29 CARD=1 BYTES=13)
1 0 SORT AGGREGATE (CARD=1 BYTES=13)
2 1 VIEW SYS.VW_DAG_0 (COST=29 CARD=40 K BYTES=517 K)
3 2 HASH GROUP BY (COST=29 CARD=40 K BYTES=517 K)
4 3 INDEX FAST FULL SCAN SYS_C0016669 (COST=27 CARD=40 K BYTES=517 K)
In case the where clause contains the indexed column then the Index Range Scan takes place
SET AUTOTRACE ON
SELECT COUNT(DISTINCT COLUMN1) FROM TESTING WHERE COLUMN1 < 10;
EXECUTION PLAN
----------------------------------------------------------
0 SELECT STATEMENT OPTIMIZER MODE=ALL_ROWS (COST=2 CARD=1 BYTES=13)
1 0 SORT AGGREGATE (CARD=1 BYTES=13)
2 1 VIEW SYS.VW_DAG_0 (COST=2 CARD=5 BYTES=65)
3 2 SORT GROUP BY NOSORT (COST=2 CARD=5 BYTES=65)
4 3 INDEX RANGE SCAN SYS_C0016669 (COST=2 CARD=5 BYTES=65)
Let me create a composite index
CREATE INDEX IDX_TEST_1 ON TESTING(COLUMN2, COLUMN3);
CREATE INDEX IDX_TEST_2 ON TESTING(COLUMN1, COLUMN2);
This has no impact unless you use them in your predicates. So the above statement will not change the plan after the index
SET AUTOTRACE ON
SELECT COUNT(DISTINCT COLUMN1) FROM TESTING WHERE COLUMN1 < 10;
EXECUTION PLAN
----------------------------------------------------------
0 SELECT STATEMENT OPTIMIZER MODE=ALL_ROWS (COST=2 CARD=1 BYTES=13)
1 0 SORT AGGREGATE (CARD=1 BYTES=13)
2 1 VIEW SYS.VW_DAG_0 (COST=2 CARD=5 BYTES=65)
3 2 SORT GROUP BY NOSORT (COST=2 CARD=5 BYTES=65)
4 3 INDEX RANGE SCAN SYS_C0016669 (COST=2 CARD=5 BYTES=65)
Q2 : A2. If CUSTOMER_ID will be used in the where clause aling with another column frequently then in that case, go for the composite index. The first column in that should be the one that is sorted. So in your case it should be the one with the index
an index on (x,y) would have a low clustering factor. (Given that the data is sorted by x) an index on (y,x) would have a high clustering factor (Given that the data is not sorted by y)
An INDEX_FFS (index fast full scan) is a process where by we read the index data as if it were a table. Normally, an index is processed a block at a time. Go to the root block, use the branch blocks to navigate. It is a data structure.
In a FAST FULL SCAN, we just read the entire structure as it exists on disk. We do not treat it as an index, but more like a table. We read it out of order (data does not come back sorted from an index ffs) and we read it using multiblock IO.
This allows us to use an index like a "skinnier" version of the table.
We can:
range scan and index (ascending or descending) combine indexes (what we do with bit mapped indexes to process complex AND/OR conditions) join indexes (use two or more indexes joined together) full scan indexes as above.
You should always Index a table on how you retrieve the data.
Q3 : A3
SET AUTOTRACE ON
SELECT COUNT(DISTINCT COLUMN1) FROM TESTING WHERE COLUMN3 = 'ASASDFF';
EXECUTION PLAN
----------------------------------------------------------
0 SELECT STATEMENT OPTIMIZER MODE=ALL_ROWS (COST=138 CARD=1 BYTES=13)
1 0 SORT AGGREGATE (CARD=1 BYTES=13)
2 1 VIEW SYS.VW_DAG_0 (COST=138 CARD=5 BYTES=65)
3 2 HASH GROUP BY (COST=138 CARD=5 BYTES=105)
4 3 TABLE ACCESS FULL TESTING (COST=137 CARD=5 BYTES=105)
Here though we have a composite index comprising the COLUMN3, it is useless
Hence for Q4 : A4, It is always better to have individual ones than that of having a composite one
Upvotes: 2
Reputation: 101
The column with the highest number of distinct values should go first. That is, if OCCUPATION_TYPE has less distinct values than CUSTOMER_ID, and you plan to query
WHERE CUSTOMER_ID = X AND OCCUPATION_TYPE = Y
then create the compound index with CUSTOMER_ID first and OCCUPATION_TYPE second.
See above. It should probably be first IF it will be included in the query.
It may be, but the optimizer will have to evaluate whether doing an index skip scan and then retrieving the actual data is faster than a full table scan.
Based on your example queries, I don't see why you would go with composite over several single-column indexes.
Edit: fixed numbering.
Upvotes: 0