Pankaj Mishra
Pankaj Mishra

Reputation: 43

Does deleting data from a database (using actual delete SQL query) cause huge problems in re-indexing of table data?

Does deleting data from a database (using actual delete SQL query) cause huge problems in re-indexing of table data (say tens of millions of data) thereby increasing system overhead and consuming more resource?

Upvotes: 2

Views: 638

Answers (1)

Venky
Venky

Reputation: 66

Most databases do not immediately delete the index nodes associated with deleted rows from the table. Depending on the specifics of how duplicate index keys are handled this may have no effect at all. For example, one scheme for duplicate key index building is to only have a single B+Tree node for the key value but have it point to a list of rows that contain that key value in the indexed column(s). In that case deleting one or even many of the rows in the table does not affect the efficiency of the index tree at all until all of the rows with that key value have been deleted at which time the key node will be flagged as deleted but not necessarily removed from the tree. Of course in the case of a unique index key any deletion will result in a node that is flagged as deleted. When that happens to many key values near each other on disk the index tree may become inefficient.

One solution is to rebuild the index from scratch either by dropping it and recreating it or if the DBMS has the feature by a “reindex” command. Another solution used by some more advanced database systems is to track whenever a search of an index actually encounters a deleted node. If this happens so often that a configured threshold is exceeded then an automated thread will “clean” the index actually removing deleted nodes and possibly compressing mostly empty index pages or even rebalancing the index tree. The advantage of this “cleaner thread” feature is that inefficient indexes that are not often used, or in which the subtrees of the index containing deleted nodes are no longer accessed (imagine deleting out-dated rows in an index whose lead column is the date used to purge rows), do not take up resources to clean or rebuild them since they are not affecting performance.

Upvotes: 4

Related Questions