Reputation: 25359
I have a cron job that updates a large number of rows in a database. Some of the rows are new and therefore inserted and some are updates of existing ones and therefore update.
I use insert on duplicate key update for the whole data and get it done in one call.
But- I actually know which rows are new and which are updated so I can also do inserts and updates seperately.
Will seperating the inserts and updates have advantage in terms of performance? What are the mechanics behind this ?
Thanks!
Upvotes: 20
Views: 23219
Reputation: 1435
I Got another totally different result. INSERT ON DUPLICATE is faster than UPATE!!!
MySQL Version
innodb_version 5.6.13
protocol_version 10
version 5.6.13-enterprise-commercial-advanced
version_compile_machine x86_64
version_compile_os osx10.7
Result
SELECT udf_CreateCounterID(0,CURRENT_DATE);
SELECT @update, @updateend, @updatediff, @insertupdate, @insertupdate_end, @insertupdatediff, @keyval, @countlmt;
@update=2013-09-12 17:32:27
@updateend=2013-09-12 17:33:01
@updatediff=34
@insertupdate=2013-09-12 17:32:00
@insertdate_end=2013-09-12 17:32:27
@insertupdatediff=27
@keyval=13
@countlmt=1000000
Table
CREATE TABLE `sys_CounterID` (`exch_year` int(11) NOT NULL,
`nextID` int(11) NOT NULL,
PRIMARY KEY (`exch_year`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Test Function
CREATE DEFINER=`root`@`localhost` FUNCTION `udf_CreateCounterID`(exchID SMALLINT, listyear DATE) RETURNS int(10) unsigned
BEGIN
DECLARE keyvalue INT UNSIGNED DEFAULT 0;
SET @countlmt = 1000000;
SET keyvalue = ((exchID % 512) << 9 ) + EXTRACT(YEAR FROM listyear) % 100;
SET @keyval = keyvalue;
SET @retVal = 0;
SET @count = @countlmt;
SET @insertupdate = SYSDATE();
WHILE @count > 0 DO
INSERT INTO `sys_CounterID`(`exch_year`,nextID)
VALUE( keyvalue, 1)
ON DUPLICATE KEY UPDATE
nextID = (@retVal := nextID + 1);
SET @count = @count - 1;
END WHILE;
SET @insertupdate_end = SYSDATE();
SET @insertupdatediff = TIMESTAMPDIFF(SECOND, @insertupdate,@insertupdate_end);
SET @count = @countlmt;
SET @update = SYSDATE();
WHILE @count > 0 DO
UPDATE sys_CounterID
SET nextID = (@retVal := nextID + 1)
WHERE exch_year = keyvalue;
SET @count = @count - 1;
END WHILE;
SET @updateend = SYSDATE();
SET @updatediff = TIMESTAMPDIFF(SECOND, @update,@updateend);
RETURN @retVal;
END
Upvotes: 5
Reputation: 18273
In my test using ON DUPLICATE KEY UPDATE is in average 1.3 x slower than using Insert/Update. This is my test:
INSERT/UPDATE (54.07 sec)
<?php
$mtime = microtime();
$mtime = explode(" ",$mtime);
$mtime = $mtime[1] + $mtime[0];
$starttime = $mtime;
?>
<?php
set_time_limit(0);
$con = mysql_connect('localhost', 'root', '');
mysql_select_db('test');
for ($i = 1; $i <= 1000; $i = $i + 2)
{
mysql_query("
INSERT INTO users
VALUES(NULL, 'username{$i}', 'email.{$i}', 'password{$i}')
");
}
for ($i = 1; $i <= 1000; $i++)
{
if ($i % 2 == 0)
{
mysql_query("
INSERT INTO users
VALUES(NULL, 'username{$i}', 'email.{$i}', 'password{$i}')
");
}
else
{
mysql_query("
UPDATE users
SET (username = 'username{$i}', email = 'email{$i}', password = 'password{$i}')
");
}
}
?>
<?php
$mtime = microtime();
$mtime = explode(" ",$mtime);
$mtime = $mtime[1] + $mtime[0];
$endtime = $mtime;
$totaltime = ($endtime - $starttime);
echo "This page was created in ".$totaltime." seconds";
?>
ON DUPLICATE KEY UPDATE (70.4 sec)
<?php
$mtime = microtime();
$mtime = explode(" ",$mtime);
$mtime = $mtime[1] + $mtime[0];
$starttime = $mtime;
?>
<?php
set_time_limit(0);
$con = mysql_connect('localhost', 'root', '');
mysql_select_db('test');
for ($i = 1; $i <= 1000; $i = $i + 2)
{
mysql_query("
INSERT INTO users
VALUES(NULL, 'username{$i}', 'email.{$i}', 'password{$i}')
");
}
for ($i = 1; $i <= 1000; $i++)
{
mysql_query("
INSERT INTO users
VALUES({$i}, 'username{$i}', 'email.{$i}', 'password{$i}')
ON DUPLICATE KEY UPDATE
username = 'username{$i}', email = 'email{$i}', password = 'password{$i}'
");
}
?>
<?php
$mtime = microtime();
$mtime = explode(" ",$mtime);
$mtime = $mtime[1] + $mtime[0];
$endtime = $mtime;
$totaltime = ($endtime - $starttime);
echo "This page was created in ".$totaltime." seconds";
?>
Upvotes: 10
Reputation: 178
Are you using individual statements for each record? You might want to look at the load data infile for a bulk update. We gained some performance last time (been a year) when I tried that.
Upvotes: 0
Reputation: 1903
You say
I actually know which rows are new and which are updated so I can also do inserts and updates seperately.
If you know without hitting the database which are INSERTs and which are UPDATEs, then running the correct statement has to be quicker than doing an INSERT ... ON DUPLICATE KEY ...
The INSERTs will be no quicker; the UPDATEs will be quicker because you don't have to attempt an INSERT first.
Upvotes: 10
Reputation: 2916
From a performance standpoint the difference is in number of statements – for in memory data sets going over network and parsing query is what is taking most of time, this is why having it in single statement helps to improve performance. Since you know which need to be inserted vs updated, I do not believe you will see any performance difference. If the Update uses a WHERE statement in which the ID of the record to be updated is indexed, you should see no performance difference.
Upvotes: 0
Reputation: 1548
It depends on which storage engine your using, MyISAM is very good at selects and inserts because it can do them concurrently, but it locks the whole table when writing so is not so good for updates. How about you try benchmarking it, and finding out which method takes longer?
Upvotes: 0