Reputation: 12452
Preamble:
This post is not about how to use PHP and MySQL, or how to write a script that logs some information to a database. This question is meant to work out the best solution for logging information on the most fastest way into a MySQL database using a PHP script! So it's truly about micro-improvements. Thank you!
Situation:
I've got a PHP script working on a server, which delivers content very fast to customers. A MySQL server is available on that machine too, so this is the weapon of choice. Now I would like to track some information about the requests. Therefore, I need to log the information somehow, and I think the best solution here is a flat database table where the information can be stored.
But I need to keep the time as low as possible for the log to have at least no impact on the response time, even on many simultaneous requests. The system has between 100K and 1M requests per day. Most of them between the working hours (8 - 18 o'clock). Actual response time is about ~3-5ms, so even 1ms would mean an increase of 20%.
The database table I've created is very flat and has no extras. Only index on that table is on the id
column, which is an PRIMARY
field with AUTO_INCREMENT
, because I would like to have unique identifier for further jobs (later on more about this). For this post and further examples, we assume a table structure like this:
| id | d1 | d2 | d3 |
|----|-----|-----|-----|
| 1 | foo | bar | baz |
| 2 | foo | bar | baz |
| 3 | foo | bar | baz |
| ...
The processing of the recorded data will be done by another script later. So there is no need to do further actions with the data, it's all about the storage itself. But the table could be grow up to about 3M rows.
Database thoughts:
First of all I asked myself about the right database engine. My first thought was, that Memory
would be the fastest, but we would lose all entries whenever the server is going down (I got a weekly maintenance window for installing updates and restarting the system too). This should not ever happen. So I came back to MyISAM
and InnoDB
. But which one to take?
So I made a simple benchmark, to see if there were big differences between these two engines. I've created three tables, each with another engine on my development machine and created a simple script, calculating some times.
CREATE TABLE `log_myisam` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`d1` varchar(3) NOT NULL,
`d2` varchar(3) NOT NULL,
`d3` varchar(3) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM;
CREATE TABLE `log_innodb` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`d1` varchar(3) NOT NULL,
`d2` varchar(3) NOT NULL,
`d3` varchar(3) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB;
CREATE TABLE `log_memory` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`d1` varchar(3) NOT NULL,
`d2` varchar(3) NOT NULL,
`d3` varchar(3) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=Memory;
My test script now simply enters 1,000,000
flat lines into the tables. Afterwards calculate an average out of consumed time. Here's the benchmark script:
foreach( array("myisam", "innodb", "memory") as $table ) {
$query = "INSERT INTO log_{$table} SET d1='foo',d2='bar',d3='baz'";
$start = microtime(true);
for( $i = 0; $i < 1000000; $i++ ) {
$sql->query($query);
}
$end = microtime(true);
$results[$table] = $end - $start;
}
As expected, the Memory
table was by far the fastest one. But even the MyISAM
is every time faster than InnoDB
. This makes sense to me, because MyISAM
leaks support for things like foreign keys
and transactions
, so there is less functional overhead in this engine.
What really surprised me is the fact, that the Memory
table is nearly as twice the size as the other tables. At this point I'm not sure why. The results:
| InnoDB | MyISAM | Memory |
|-----------------|-----------|-----------|-----------|
| time for insert | 133.92 s | 101.00 s | 79.351 s |
| avg. per entry | 0.1392 ms | 0.1010 ms | 0.0794 ms |
| time saved in % | 0.0000 % | 24.585 % | 21.436 % |
| table size | 35.6 mb | 29.9 mb | 55.9 mb |
But as far as I know, MyISAM
locks the table while executing an INSERT
. This could be problematic on many simultaneous requests. But I don't know how to benchmark this.
Another question for me is, how the index of the id
column will affect the run time. Is it helpful or will it slow down the time. So I let the benchmark script run again, after I removed the PRIMARY
index and the AUTO_INCREMENT
option from the id
column.
| InnoDB | MyISAM | Memory |
|-----------------|-----------|-----------|-----------|
| time with id | 133.92 s | 101.00 s | 79.351 s |
| avg. with id | 0.1392 ms | 0.1010 ms | 0.0794 ms |
|-----------------|-----------|-----------|-----------|
| time without id | 131.88 s | 91.868 s | 73.014 s |
| avg. without id | 0.1319 ms | 0.0919 ms | 0.0701 ms |
MyISAM
seems to take the most advantage of dropping the index. But the range of the two results are not as width spreaded as expected.
Query thoughts:
The query itself has been kept simple. I would not know how to improve this any further.
INSERT INTO log_myisam
SET
d1 = 'foo',
d2 = 'bar',
d3 = 'baz'
PHP script thoughts:
One thing that would cost time, is the connection itself. Because of that, I would go with a persistent connection. I've used mysqli
of course. But is there a difference between procedural
or oop
usage? I've made a simple benchmark again.
$startProcedual = microtime(true);
for( $i = 0; $i < 1000; $i++ ) {
$sql = mysqli_connect('localhost', 'root', '', 'benchmark');
mysqli_close($sql);
unset($sql);
}
$endProcedual = microtime(true);
$startOop = microtime(true);
for( $i = 0; $i < 1000; $i++ ) {
$sql = new mysqli('localhost', 'root', '', 'benchmark');
$sql->close();
unset($sql);
}
$endOop = microtime(true);
Without a persistent connection the difference is quite visible! The oop
style is a bit faster, and this are only 1.000 connections.
procedural: 0.326150 s
oop: 0.256580 s
With persistent connection enabled, both versions are nearly the same. But the whole connection time dropped by one third of the normal one. So it seems the best way is to go with a persistent connection here.
procedural: 0.093201 s
oop: 0.092088 s
My temporary conclusion:
Actually I'm at a logging time of 0.204 ms (as an avg. of 100.000 inserts).
As for now I would say the following:
MyISAM
because it's the best mix of speed, security and sizeid
column because of the index for more speedoop
styleBut there are some open questions for me. Did I made the right decisions? Is MyISAM
blocking the execution? Is there a way to use Memory
? Could a persistent connection could have any side effects, like slowing down through higher memory usage? ...
I would really appreciate your ideas or tips for an even faster logging. Maybe I'm totally wrong at some points. Please let me know if you have any experiences with this kind of things.
Thanks in advance!
Upvotes: 0
Views: 811
Reputation: 24959
Regardless of the Engine, and I am glad not to touch that hot potato, the fastest way by far will be to use LOAD DATA INFILE
from CSV files that you create. So, the CSV is created in append mode as the traffic comes in. Have a mechanism to close one version down, get a new incrementor, and start fresh. Perhaps hourly. Your files may look like this when done
/tmp/csvfiles/traffic_20160725_00.csv
...
/tmp/csvfiles/traffic_20160725_23.csv
That just got you 24 hours of traffic. Upload as described above, whenever you feel like it, rest assured of two things:
INSERT
call (by factors that will blow your mind away).An added bonus is that your csv, let's just call them text files, are pretty much ready to rock and roll into a no sql solution when you decide that is where it might belong anyway.
Note: I am a big fan of Events. I have 3 links off my profile page for them as examples. However, Events and Stored Procs are forbidden to use LOAD DATA INFILE
. So I have separate agents that do that. Those agents are nice because over time I naturally add different functionality to them. With the combo of Events and agents, there is never a need to use cron or other o/s schedulers.
The Takeway: Never use INSERT
for this.
Upvotes: 2