Nir
Nir

Reputation: 25359

MySQL: Is it faster to use inserts and updates instead of insert on duplicate key update?

I have a cron job that updates a large number of rows in a database. Some of the rows are new and therefore inserted and some are updates of existing ones and therefore update.

I use insert on duplicate key update for the whole data and get it done in one call.

But- I actually know which rows are new and which are updated so I can also do inserts and updates seperately.

Will seperating the inserts and updates have advantage in terms of performance? What are the mechanics behind this ?

Thanks!

Upvotes: 20

Views: 23219

Answers (6)

Shen liang
Shen liang

Reputation: 1435

I Got another totally different result. INSERT ON DUPLICATE is faster than UPATE!!!

MySQL Version

innodb_version 5.6.13

protocol_version 10

version 5.6.13-enterprise-commercial-advanced

version_compile_machine x86_64

version_compile_os osx10.7

Result

SELECT udf_CreateCounterID(0,CURRENT_DATE);
SELECT @update, @updateend, @updatediff, @insertupdate, @insertupdate_end, @insertupdatediff, @keyval, @countlmt;

@update=2013-09-12 17:32:27

@updateend=2013-09-12 17:33:01

@updatediff=34

@insertupdate=2013-09-12 17:32:00

@insertdate_end=2013-09-12 17:32:27

@insertupdatediff=27

@keyval=13

@countlmt=1000000

Table

CREATE TABLE `sys_CounterID` (`exch_year` int(11) NOT NULL,
                              `nextID` int(11) NOT NULL,
                               PRIMARY KEY (`exch_year`)
                             ) ENGINE=InnoDB DEFAULT CHARSET=utf8;

Test Function

 CREATE DEFINER=`root`@`localhost` FUNCTION `udf_CreateCounterID`(exchID SMALLINT, listyear DATE) RETURNS int(10) unsigned
 BEGIN
DECLARE keyvalue INT UNSIGNED DEFAULT 0;

SET @countlmt = 1000000;
SET keyvalue = ((exchID % 512) << 9 ) + EXTRACT(YEAR FROM listyear) % 100;

SET @keyval = keyvalue;
SET @retVal =  0;

SET @count = @countlmt;
SET @insertupdate = SYSDATE();

WHILE @count > 0 DO

    INSERT INTO `sys_CounterID`(`exch_year`,nextID)
    VALUE( keyvalue, 1)
    ON DUPLICATE KEY UPDATE 
        nextID = (@retVal := nextID + 1);

    SET @count = @count - 1;

END WHILE;

SET @insertupdate_end = SYSDATE();
SET @insertupdatediff = TIMESTAMPDIFF(SECOND, @insertupdate,@insertupdate_end);


SET @count = @countlmt;
SET @update = SYSDATE();

WHILE @count > 0 DO

    UPDATE sys_CounterID 
    SET nextID = (@retVal := nextID + 1)
    WHERE exch_year = keyvalue;
    SET @count = @count - 1;
END WHILE;

SET @updateend = SYSDATE();
SET @updatediff = TIMESTAMPDIFF(SECOND, @update,@updateend);
RETURN @retVal;
END

Upvotes: 5

Tam&#225;s Pap
Tam&#225;s Pap

Reputation: 18273

In my test using ON DUPLICATE KEY UPDATE is in average 1.3 x slower than using Insert/Update. This is my test:

INSERT/UPDATE (54.07 sec)

    <?php 
       $mtime = microtime(); 
       $mtime = explode(" ",$mtime); 
       $mtime = $mtime[1] + $mtime[0]; 
       $starttime = $mtime; 
    ?> 
    <?php
    set_time_limit(0);
    $con = mysql_connect('localhost', 'root', '');
    mysql_select_db('test');

    for ($i = 1; $i <= 1000; $i = $i + 2)
    {
        mysql_query("
                    INSERT INTO users
                    VALUES(NULL, 'username{$i}', 'email.{$i}', 'password{$i}')
                    ");
    }

    for ($i = 1; $i <= 1000; $i++)
    {
        if ($i % 2 == 0)
        {
            mysql_query("
                    INSERT INTO users
                    VALUES(NULL, 'username{$i}', 'email.{$i}', 'password{$i}')
                    ");
        }
        else
        {
             mysql_query("
                        UPDATE users
                        SET (username = 'username{$i}', email = 'email{$i}', password = 'password{$i}')
                        ");
        }
    }
    ?>
    <?php 
       $mtime = microtime(); 
       $mtime = explode(" ",$mtime); 
       $mtime = $mtime[1] + $mtime[0]; 
       $endtime = $mtime; 
       $totaltime = ($endtime - $starttime); 
       echo "This page was created in ".$totaltime." seconds"; 
    ?>

ON DUPLICATE KEY UPDATE (70.4 sec)

<?php 
   $mtime = microtime(); 
   $mtime = explode(" ",$mtime); 
   $mtime = $mtime[1] + $mtime[0]; 
   $starttime = $mtime; 
?> 
<?php
set_time_limit(0);
$con = mysql_connect('localhost', 'root', '');
mysql_select_db('test');

for ($i = 1; $i <= 1000; $i = $i + 2)
{
    mysql_query("
                INSERT INTO users
                VALUES(NULL, 'username{$i}', 'email.{$i}', 'password{$i}')
                ");
}

for ($i = 1; $i <= 1000; $i++)
{
    mysql_query("
                INSERT INTO users
                VALUES({$i}, 'username{$i}', 'email.{$i}', 'password{$i}')
                ON DUPLICATE KEY UPDATE
                username = 'username{$i}', email = 'email{$i}', password = 'password{$i}'
                ");    
}
?>
<?php 
   $mtime = microtime(); 
   $mtime = explode(" ",$mtime); 
   $mtime = $mtime[1] + $mtime[0]; 
   $endtime = $mtime; 
   $totaltime = ($endtime - $starttime); 
   echo "This page was created in ".$totaltime." seconds"; 
?>

Upvotes: 10

gsharma
gsharma

Reputation: 178

Are you using individual statements for each record? You might want to look at the load data infile for a bulk update. We gained some performance last time (been a year) when I tried that.

Upvotes: 0

gdt
gdt

Reputation: 1903

You say

I actually know which rows are new and which are updated so I can also do inserts and updates seperately.

If you know without hitting the database which are INSERTs and which are UPDATEs, then running the correct statement has to be quicker than doing an INSERT ... ON DUPLICATE KEY ...

The INSERTs will be no quicker; the UPDATEs will be quicker because you don't have to attempt an INSERT first.

Upvotes: 10

Gary
Gary

Reputation: 2916

From a performance standpoint the difference is in number of statements – for in memory data sets going over network and parsing query is what is taking most of time, this is why having it in single statement helps to improve performance. Since you know which need to be inserted vs updated, I do not believe you will see any performance difference. If the Update uses a WHERE statement in which the ID of the record to be updated is indexed, you should see no performance difference.

Upvotes: 0

studioromeo
studioromeo

Reputation: 1548

It depends on which storage engine your using, MyISAM is very good at selects and inserts because it can do them concurrently, but it locks the whole table when writing so is not so good for updates. How about you try benchmarking it, and finding out which method takes longer?

Upvotes: 0

Related Questions