Reputation: 3
I need to delete the data from table1
for the ids present in table2
.
I have executed the following statement but it is going for table scan even though indexes are present:
DELETE FROM table1 t1 WHERE t1.ID IN (SELECT T.ID FROM table2 T) ;
Is there some other way to join tables in DB2 for delete statement so that I can avoid table scan.
Upvotes: 0
Views: 10048
Reputation: 11042
A table scan, in and of itself, is not necessarily a bad thing.
CAVEAT: Without an explain plan or DDL for table1
, you can't get a complete answer for this particular case.
That said, DB2's optimizer will determine the most efficient plan for executing the query. Depending on how much data you're deleting from table1
, it may simply be more efficient to scan the table than it would be to go through the index for each ID to be deleted and then fetch the row(s).
Some of the factors that go into the optimizer's decision for this are:
Table and index statistics
The number of rows to be deleted (and relative to the total number of rows in table1
)
The cluster ratio (or cluster factor) of the potentially useful index(es)
The number of levels in the aforementioned index(es)
Bufferpool size(s)
Here is an extremely simplistic example illustrating the point that a table scan isn't necessarily bad:
Assume that t1
has 500 pages, and the index i1
that you believe should be used has 3 levels and a very low cluster factor. Further, let's assume you're going to delete 200 rows from the table.
To perform the delete via a simple table scan, DB2 will scan 500 data pages for the table.
To do the delete via the index, for each row to delete it will read 3 index pages (root, intermediate and leaf page), plus the data page holding the row. (4 data pages read for each row to be deleted). So, when deleting 200 rows, using the index would mean DB2 is reading 800 pages (60% more than the table scan) – so the optimizer chooses the table scan.
Obviously it's a lot more complicated than this – the size of the index, whether to do a full index scan or (as described) step through the index for each row, buffer pool sizes, cluster ratios and more all play into the optimizer's decision. But there are plenty of cases where the table scan truly is the most efficient method for a given query.
Upvotes: 3