Reputation: 311

Database engine internals

I have been using access and mysql... recently i started working withe standard file operations in linux. Since the databases store data in files, do they perform the file operations on every sql transaction or only on closing/opening database?. I would think its more efficient to store the data in ram with efficient data structures to handle the updates, and only write to file closing a connection. I am not interested in relational algebra for now but where can i find more details to answer my questions?

Upvotes: 2

Answers (2)

Tim M.

Reputation: 54359

It's much more complex than just a file system versus RAM discussion. MS SQL Server has well-documented data storage methodologies (I'm sure MySQL does too, not sure about Access). I will answer from my experience with MS SQL Server.

Here's an article to start with: http://msdn.microsoft.com/en-us/library/ms189051.aspx

To give a little insight around your question:

I would think its more efficient to store the data in ram with efficient data structures to handle the updates, and only write to file closing a connection.

Database engines absolutely use RAM to improve performance, especially for reads (including indexes). Usually (not always) the number of reads is greater than the number of writes, even in an OLTP environment. Reading data can actually be much more expensive than writing data; however writing data carries with it the penalty of incurring locking which can have significant ramifications.
The actual buffering implementation will vary by database and also may depend on whether or not a transaction is being used to write data. There are things which the database engine must consider, such as what if another caller requests a dirty read? or what if the system fails in the middle of the operation? What if the database servers are clustered or replicated? It doesn't mean that this is impossible, just that it is a complex matter.
One example of integrity assurance: in a MS SQL Server mirrored environment, two servers may be written to with a third server acting as a witness. The writes can be asynchronous, or the writes may be synchronized requiring success on the part of both parties to continue. This is slower but guarantees (within reason) that the data has actually been sent to both nodes and that both nodes are confident that their persistence operation succeeded.
Connections are often pooled and not truly closed, as opening and closing is quite expensive. Connections can (and often should) be actively reused (regardless of pooling).

I found this article on MS SQL Server buffer writes interesting: http://msdn.microsoft.com/en-us/library/aa337525.aspx.

It may answer your question more specifically about when data is written to disk.

Upvotes: 2

crowne

Reputation: 8534

In order to maintain integrity, writes will need to be done after every transaction has completed.
Transactions are completed, when the COMMIT statement is executed.
There may be many statements (SELECT, INSERT, UPDATE & DELETE) within a single transaction.
Writing only on open and close will not provide adequate support for ACID transactions.

Many databases maintain a separate internal log with the purpose of recording the transactions, the write to the log is a critical operation, and the log can be used rebuild the database in a crash recovery situation.

Some database implementations may choose to persist the actual entity data to disk at a later point, as long as the logs have been successfully written to disk.
Subsequent reads of the same data may be retrieved from buffers in memory, if there is a crash before the entity data is written to disk, the recovery procedures can apply the changes from the logs at the next startup, allowing the database to maintain internal consistency.

Upvotes: 3

Database engine internals

Answers (2)

Related Questions