Reputation: 1265
I am in a debate with a guy telling me that there is no performance hit for using his function that...
Auto index, repair and optimize MySQL tables using PHP class __destruct() on every single page load by every user who runs the page.
He is asking me why I think it is not good for performance but I do not really know, can someone tell me why such a thing isn't good?
UPDATE His reasoning...
Optimizing & repairing the database tables eliminates the byte size of overhead that can essentially slow down additional queries when multiple connections and table use are concerned. Even with a performance enhanced database schema with indexing enabled.
Not to mention the amount of execution time to perform these operations are slim to none in memory and processor threading.
Opening, reading, writing, updating and then cleaning up after oneself makes more sense to me then performing the same operations and leaving unnecessary overhead behind waiting for a cron entry to clean up.
Upvotes: 3
Views: 2436
Reputation: 21
I have set the following code into our scheduled job running every early morning, when users don't access frequently our site (I read that OPTIMIZE should lock affected tables during optimization).
Advantage using this function is that a single query is composed with all table names comma-separated, instead executing a lot of queries, one for each table to optimize.
It's supposed that you have a db connection opened and a db selected yet, in order to use this function without specifying db connection, db name, etc.
$q = "SHOW TABLE STATUS WHERE Data_Free > '0'";
$res = mysql_query($q); $TOOPT = mysql_num_rows($res);
$N = 0; // number of optimized tables
if(mysql_num_rows($res) > 0)
{
$N = 1;
while($t = mysql_fetch_array($res))
{
$TNAME = $t['Name']; $TSPACE += $t['Data_free'];
if($N < 2)
{
$Q = "OPTIMIZE TABLE ".$TNAME."";
}
else
{
$Q .= ", ".$TNAME."";
}
$N++;
} // endwhile tables
mysql_query($Q);
} // endif tables found (to optimize)
Upvotes: 2
Reputation: 1801
The docs states...
OPTIMIZE TABLE should be used if you have deleted a large part of a table or if you have made many changes to a table with variable-length rows (tables that have VARCHAR, VARBINARY, BLOB, or TEXT columns). Deleted rows are maintained in a linked list and subsequent INSERT operations reuse old row positions. You can use OPTIMIZE TABLE to reclaim the unused space and to defragment the data file. After extensive changes to a table, this statement may also improve performance of statements that use the table, sometimes significantly.
When operations have been performed performance is enhanced by using the 'OPTIMIZE' command.
FLUSH TABLES has several variant forms. FLUSH TABLE is a synonym for FLUSH TABLES, except that TABLE does not work with the WITH READ LOCK variant.
Using the 'FLUSH TABLE' command vs. 'FLUSH TABLES' there is no READ LOCK performed.
Normally, you should never have to run REPAIR TABLE. However, if disaster strikes, this statement is very likely to get back all your data from a MyISAM table. If your tables become corrupted often, you should try to find the reason for it, to eliminate the need to use REPAIR TABLE. See Section C.5.4.2, “What to Do If MySQL Keeps Crashing”, and Section 13.5.4, “MyISAM Table Problems”.
It is my understanding here that if the 'REPAIR TABLE' command is run consistantly the condition concerning large records created would be eliminated as constant maintances is performed. If I am wrong I would like to see benchmarks as my own attempts have not shown anything too detrimental, although the record sets have been under the 10k mark.
Here is the pice of code that is being used and @codedev is asking about...
class db
{
protected static $dbconn;
// rest of database class
public function index($link, $database)
{
$obj = $this->query('SHOW TABLES');
$results = $this->results($obj);
foreach ($results as $key => $value){
if (isset($value['Tables_in_'.$database])){
$this->query('REPAIR TABLE '.$value['Tables_in_'.$database]);
$this->query('OPTIMIZE TABLE '.$value['Tables_in_'.$database]);
$this->query('FLUSH TABLE '.$value['Tables_in_'.$database]);
}
}
}
public function __destruct()
{
$this->index($this->dbconn, $this->configuration['database']);
$this->close();
}
}
Upvotes: 0
Reputation: 562731
Instead of arguing, why not measure? Use a toolkit to profile where you're spending time, such as Instrumentation for PHP. Prove that the optimize step of your PHP request is taking a long time.
Reindexing is an expensive process, at least as costly as doing a table-scan as if you did not have an index. You should build indexes infrequently, so that you serve many PHP requests with the aid of the index for every one time you build the index. If you're building the index on every PHP request, you might as well not define indexes at all, and just run table-scans all the time.
REPAIR TABLE is only relevant for MyISAM tables (and Archive tables). I don't recommend using MyISAM tables. You should just use InnoDB tables. Not only for the sake of performance, but also data safety. MyISAM is very susceptible to data corruption, whereas InnoDB protects against that in most cases by maintaining internal checksums per page.
OPTIMIZE TABLE for an InnoDB table rebuilds all the data and index pages. This is going to be immensely expensive once your table grows to a non-trivial size. Certainly not something you would want to do on every page load. I would even say you should not do OPTIMIZE TABLE during any PHP web request -- do it offline via some script or admin interface.
A table restructure also locks the table. You will queue up all other PHP requests that access the same table for a long time (i.e. minutes or even hours, depending on the size of the table). When each PHP request gets its chance, it'll run another table restructure. It's ridiculous to incur this amount of overhead on every PHP request.
You can also use an analogy: you don't rebuild or optimize an entire table or index during every PHP request for the same reason you don't give your car a tune-up and oil change every time you start it:
It would be expensive and inconvenient to do so, and it would give no extra benefit compared to performing engine maintenance on an appropriate schedule.
Upvotes: 6
Reputation: 63962
Because every single operation (index,repair and optimize) takes considerable time; in fact they are VERY expensive (table locks, disk IO, risk of data loss) if the tables are even slightly big.
Doing this on every page load is definitely not recommended. It should be done only when needed.
Upvotes: 4
Reputation: 7589
Repair table could cause data loss as stated in documentation, so it requires previous backup to avoid further problems. Also, it is intended to be run only in case of disaster (something HAS failed).
Optimize table blocks the table under maintenance so it could cause problems to concurrent users.
My 0.02: Database management operations should not be part of common user transactions as they are expensive in time and resources as your tables grow.
Upvotes: 3