Reputation: 10030
I have this query and I want to increase its performance by adding appropriate Indexes.
DELETE FROM MYTAB1 WHERE MYID1 IN (SELECT MYID2 FROM MYTAB2);
I am not familiar with the syntax for indices and the type of settings that they require. Please provide the same. The main issue here is that MYTAB1 has millions of records and thus the query takes a lot of time. However, MYTAB2 has only 1000 records. MYID1 is the primary key for MYTAB1
I have tried creating the index:
CREATE INDEX IDX_TAB1_ID1 ON MYTAB1(MYID1);
It did not have much impact on the performance of the query.
I ran the explain Plan and got this:
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)|
------------------------------------------------------------------------------------
| 0 | DELETE STATEMENT | | 63977 | 11M| | 62216 (2)|
| 1 | DELETE | MYTAB1 | | | | |
| 2 | HASH JOIN RIGHT SEMI| | 63977 | 11M| 7688K| 62216 (2)|
| 3 | TABLE ACCESS FULL | MYTAB2 | 437K| 2561K| | 1189 (2)|
| 4 | TABLE ACCESS FULL | MYTAB1 | 3761K| 678M| | 24718 (4)|
------------------------------------------------------------------------------------
Upvotes: 1
Views: 234
Reputation: 146239
"the thing is MYTAB2 has only 1000 records! "
Yes but the relevant datapoint is how many records in MYTAB1 match those thousand records? What percentege of the whole table does that figure represent? And what is the distribution of those records in MYTAB1?
If you are looking to delete 20% of the rows in MYTAB1 an index will only make performance worse (if the optimizer is stupid enough to use it). If you are only deleting 0.1% of the records in MYTAB1 but those records are a distributed across every block in the table, again a Full Table Scan is the more efficient option.
There are no simple solutions in tuning. It always depends on the interplay of a number of different factors. How often do you want to run this deletion? Do you have an Enterprise Edition licence and lots of spare CPU capacity? And so on.
If MYID1 is the primary key of the MYTAB1 then there should already be a UNIQUE index on that column. So you don't need to create a new index.
Unless you're one of those places which doesn't bother applying integrity constraints on your tables. That is bad practice. Apart from the obvious benefits of enforcing integrity, constraints give the optimizer useful information which leads to better execution plans.
Anyway the root of your problem is clear in the explain plan you have now posted. You say MYTAB2 has only one thousand rows but the optimizer seems to think it has four hundred and thirty-seven thousand rows. So, clearly you need to gather fresh stats on that table:
exec dbms_state.gather_table_stats(ownname=>user, tabname=>'MYTAB2',estimate_percent=>100)
I suppose the statistics for MYTAB1 are correct and it does have in the order of 3.7 million rows? If so an indexed lookup woudl be the most performant option. You need to check that you have a unique index on that primary key column:
select i.index_name, i.uniqueness
from user_indexes i
join user_ind_columns c
on ( i.index_name = c.index_name)
where i.table_name = 'MYTAB1'
and c.column_name = 'MYID1'
If you don't have an index you need to create one:
create unique index mytab1_uidx on mytab1(myid1)
/
If you have an index but it's not unique then you should probably drop it and build a unique index.
Note that if you are mistaken and that column is not a primary key - i.e. it has duplicates - then that CREATE INDEX statement will fail. In taht event you have a bigger problem which you will need to ponder.
"however the number of rows that [MYTAB2] contains is highly volatile... basically some rows are added to the table then some are removed and the process continues "
In this scenario having any fixed statistics is uphelpful. A better idea is to force the optimizer togenerate stats dynamically at runtime.
exec dbms_state.delete_table_stats(ownname=>user, tabname=>'MYTAB2')
exec dbms_state.lock_table_stats(ownname=>user, tabname=>'MYTAB2')
Deleting the table's statistics and then locking it will force the database to generate stats for the table every time you include it in a query, providing you have enabled dynamic sampling. This should generate a better execution plan whenever you run that delete statement, regardless of how many rows MYTAB2 happens to be holding at that moment.
Upvotes: 3
Reputation: 231661
The optimizer believes that MYTAB2
has approximately 437,000 rows so you're trying to delete roughly 11.6% of the rows in the table. If MYTAB2
actually only has 1,000 rows, that implies that the statistics on MYTAB2
are out of date. If you gather statistics on the table
BEGIN
DBMS_STATS.GATHER_TABLE_STATS( <<owner of the table>>,
'MYTAB2' );
END;
and then re-run the query plan, does the plan change? Does the query run more quickly?
The next question is why the optimizer had thought that MYTAB2
had so many rows. Is this a temporary table that is not declared as a global temporary table? Was the table much bigger in the past but then you permanently removed 436,000 of 437,000 rows?
Upvotes: 2
Reputation: 1547
This is a classic problem. Sometimes you are better off creating a new table with the rows you want to save, then renaming the new_table to the original_table.
General outline:
create table new_table as
select * from original_table
where myid1 not in (select myid2 from mytab2)
;
drop table original_table;
rename new_table to original_table;
more details on the activity:
Bulk Delete using CTAS Method
a. Create table new_table with nologging
CREATE TABLE new_table NOLOGGING (….);
b. Insert /*+ APPEND */ into new_table select the records you want to keep from current_table.
c. Create the indexes on the new_table with NOLOGGING (*)
d. Create constraints, grants etc.
e. Drop current_table.
f. Rename new_table to current.
g. Backup the data.
(*) If the data left is so small or there are a lot of dependencies on the table (views, procedures, functions, etc) the following steps can be used instead of c-g above:
c. Disable constraints on current_table.
d. Truncate current_table;
e. Make indexes unusable
f. Alter current_table NOLOGGING
g. Insert /*+ APPEND */ into current_table
select * from new_table;
h. commit;
i. enable constraints
j. Alter current_Table and indexes to LOGGING
k. Backup the data
l. drop table new_table;
Upvotes: 2
Reputation: 52346
The documentation for index creation in Oracle 10.2 is here.
You need something like:
create index index_name on table_name(column_name);
Upvotes: 1