Reputation: 621
I have a question about when does innodb update row data in the buffer and when does the change go to the disk. This question comes from the reading to undo log which says the history data are in the undo log waiting for rollbacks. If the engine needs undo log for rollback, changes of an update query must have changed the row before commit? And then what does the commit do since the data have already been updated.
Upvotes: 0
Views: 116
Reputation: 142208
When you INSERT
, UPDATE
, or DELETE
a row:
Quick summary:
More details (on those steps):
PRIMARY KEY's
BTree. If the block is not in the buffer_pool (which is allocated in RAM), fetch it from disk. (This may involve bumping some other block out of the buffer_pool.UNIQUE
keys: All INSERTs
and UPDATEs
that change the Unique key's column(s) necessarily check for dup-key rather than going through the change buffer.
AUTO_INCREMENT
has some other special actions.
Depending on the values of innodb_flush_log_at_trx_commit
and innodb_doublewrite
something may be flushed to disk at the end of the transaction. These handle "atomic" transactions and "torn pages".
Replication: Other activity may include writing to and syncing the binlog, and pushing data to other nodes in the cluster.
The design is "optimistic" in that it is optimized for COMMIT
at the expense of ROLLBACK
. After a Commit, a process runs around purging the copies that were kept in case of a crash and Rollback. A Rollback is more complex in that it must put back the old copies of the rows. (See also "history list".)
Search for some of the keywords I have mentioned; read some other web pages; then come back with a more specific question.
Commit
Let's look at it from a different side. Each row, including non-yet-committed rows being changed/deleted, has a "transaction id". All the rows for a given transaction have the same id. So, even if there is a crash, InnoDB, knows what to cleanup. COMMIT
and ROLLBACK
need to be 'atomic'; this is aided by having a single write to disk "says it all". The only way for that to be possible is for the transaction id to be the key. Keep in mind, there could be a million rows scattered around the buffer_pool and data files and logs waiting for the commit/rollback.
After the commit/rollback, InnoDB can leisurely run around cleaning up things. For example, until a UPDATE
is committed or rolled back, there are two copies of each row being changed. One of the rows needs to be removed -- eventually. Meanwhile, the two rows are on a "history list". Any other transactions search through the history list to see which one row they are allowed to see -- READ UNCOMMITTED
= latest row that has not been committed / rolled back; READ COMMITTED
= latest row that has been committed / rolled back; etc.
If I understand it correctly, the undo log is an optimization. For example, on a DELETE
the "old values" of the rows are copied to the undo log, and the row is actually deleted from the data BTree. The optimization here is that the undo log is serially written, while the BTree may involve a lot more blocks, scattered around the table. Also, the normal processing of data blocks includes caching them in the buffer_pool. For Commit, the records in the undo log are tossed. For Rollback, there is the tedious effort of using the undo log for reconstruction.
Yes, the history list adds work for all other transactions touching your recently changed rows. But it enables transaction-isolation-modes and aids in recovery from crashes.
Upvotes: 1