Out 05 2008
Speed Up MySQL: Replace or On Duplicate Key Update?
It comes to a time when you need to optimize your code to the maximum. Usually most people do not worry to much about code performance or speed, this is a big mistake.
However, when you have your own server and less performance means more memory usage, you will get to the point the hard way. Most of the time, more memory usage means more hardware failure and server slow downs and down times.
Welcome to the world of code optimizing – today I will write about the usage of the REPLACE and ON DUPLICATE KEY UPDATE MySQL functions.
These two functions are identical, and they have great potential, in fact I love how they simplify my queries to database.
REPLACE works like INSERT, however if an old row in the table has the same value as a new row for a PRIMARY KEY or a UNIQUE index, the old row is deleted before the new row is inserted.
Let’s consider the following table:
-- ---------------------------- -- Table structure for online -- ---------------------------- DROP TABLE IF EXISTS `online`; CREATE TABLE `online` ( `ip_address` bigint(10) unsigned NOT NULL, `access_time` bigint(10) NOT NULL, PRIMARY KEY (`ip_address`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1; -- ---------------------------- -- Records -- ---------------------------- INSERT INTO `online` VALUES (2130706433, 1223224699); INSERT INTO `online` VALUES (3581352818, 1223218573); INSERT INTO `online` VALUES (1394046411, 1223218580); INSERT INTO `online` VALUES (1452335504, 1223218580);
This will give us:
mysql> SELECT * FROM `online`; +------------+-------------+ | ip_address | access_time | +------------+-------------+ | 2130706433 | 1223224699 | | 3581352818 | 1223218573 | | 1394046411 | 1223218580 | | 1452335504 | 1223218580 | +------------+-------------+ 4 rows in set
So, what we have here is a MySQL table that we will use to insert visitors IP’s in IPV4 format and we will record the UNIX TIME every time a unique IP accesses a page.
NOTE:
For more information about IPv4 please refer to either the PHP function ip2long or the MySQL function INET_ATON, as I’m not going into much detail about it.
An inexperienced MySQL programmer would probably do a SELECT statement to check if a particular IP was already in database and than a INSERT or UPDATE statement depending on the SELECT result.
This is where the REPLACE function comes in handy, with a single query you are able to either UPDATE or INSERT. Of course, you need to have a PRIMARY KEY or a UNIQUE index on your table.
So for the IPv4 ‘2130706433′, I can do:
REPLACE INTO `online` (`ip_address`, `access_time`) VALUES (2130706433, UNIX_TIMESTAMP());
Time token: 0.09169”
Since I already have the IPv4 2130706433 the above query will count 2 affected rows. First to DELETE the previous entry and the second to INSERT it with the new UNIX_TIMESTAMP value.
Now, if you try with an IP that’s not yet in the table, you would get 1 row count, that’s the way for you to know if a row was updated or inserted.
At the same time, you must be careful with the REPLACE function, since you cannot refer to values from the current row and use them in the new row.
Unlike the REPLACE function, with the ON DUPLICATE KEY UPDATE statement you can use the VALUES(col_name) function in the UPDATE clause to refer to column values from the INSERT portion of the INSERT/UPDATE statement.
Just like REPLACE, the ON DUPLICATE KEY UPDATE affected-rows value per row is 1 if the row is inserted as a new row and 2 if an existing row is updated.
This is how we would use ON DUPLICATE KEY UPDATE instead of REPLACE:
INSERT INTO `online` ( `ip_address`, `access_time` ) VALUES ( 2130706433, UNIX_TIMESTAMP() ) ON DUPLICATE KEY UPDATE `access_time` = UNIX_TIMESTAMP();
Time token: 0.05685”
Can you now spot the main difference now?
REPLACE ----------------------> toke 0.09169'' ON DUPLICATE KEY UPDATE -----> toke 0.05685''
Gain:
0.09169 – 0.05685 = 0.03484 seconds
The results explain it all, use ON DUPLICATE KEY UPDATE whenever you can.
Happy MySQL speed up!
Não perca os meus artigos! Subscreva a minha feed RSS.
Setembro 15th, 2009 at 13:44
One comment I need to make, the two functions are not identical, though I don’t see a reason to ever use REPLACE INTO.
ON DUPLICATE KEY can actually access multi-list values as well as accessing the original value of the row. For another example, consider your table with a “viewCount” column:
INSERT INTO `online` VALUES (2130706433, 1223224699, 1),(3581352818, 1223218573, 1),(1394046411, 1223218580, 1)(1452335504, 1223218580, 1) ON DUPLICATE KEY UPDATE viewCount = viewCount + VALUES(viewCount);
Of course, using VALUES() here won’t do you any good, you should just hard-code the 1 in there, but you get the point.
Note that ON DUPLICATE KEY adds 1 to the affected_rows for any INSERTS, and 2 to the affected_rows for any UPDATES, even in this format.
Fevereiro 26th, 2010 at 20:05
You both seem to have missed the preferred solution to the indicated scenario.
In this logging situation where you are updating ONE row each time it is much better to do an UPDATE and check the number of affected rows. If zero rows are affected then follow up with an INSERT. As you will typically have many more UPDATEs than INSERTs this would be the better way although to see any real difference you need quite significant levels of concurrency.