eisbehr
eisbehr

Reputation: 12452

What is the fastest way to log data to a MySQL sever within a PHP script?

Preamble:

This post is not about how to use PHP and MySQL, or how to write a script that logs some information to a database. This question is meant to work out the best solution for logging information on the most fastest way into a MySQL database using a PHP script! So it's truly about micro-improvements. Thank you!


Situation:

I've got a PHP script working on a server, which delivers content very fast to customers. A MySQL server is available on that machine too, so this is the weapon of choice. Now I would like to track some information about the requests. Therefore, I need to log the information somehow, and I think the best solution here is a flat database table where the information can be stored.

But I need to keep the time as low as possible for the log to have at least no impact on the response time, even on many simultaneous requests. The system has between 100K and 1M requests per day. Most of them between the working hours (8 - 18 o'clock). Actual response time is about ~3-5ms, so even 1ms would mean an increase of 20%.

The database table I've created is very flat and has no extras. Only index on that table is on the id column, which is an PRIMARY field with AUTO_INCREMENT, because I would like to have unique identifier for further jobs (later on more about this). For this post and further examples, we assume a table structure like this:

| id | d1  | d2  | d3  |
|----|-----|-----|-----|
| 1  | foo | bar | baz |
| 2  | foo | bar | baz |
| 3  | foo | bar | baz |
| ...

The processing of the recorded data will be done by another script later. So there is no need to do further actions with the data, it's all about the storage itself. But the table could be grow up to about 3M rows.


Database thoughts:

First of all I asked myself about the right database engine. My first thought was, that Memory would be the fastest, but we would lose all entries whenever the server is going down (I got a weekly maintenance window for installing updates and restarting the system too). This should not ever happen. So I came back to MyISAM and InnoDB. But which one to take?

So I made a simple benchmark, to see if there were big differences between these two engines. I've created three tables, each with another engine on my development machine and created a simple script, calculating some times.

CREATE TABLE `log_myisam` (
    `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
    `d1` varchar(3) NOT NULL,
    `d2` varchar(3) NOT NULL,
    `d3` varchar(3) NOT NULL,
    PRIMARY KEY (`id`)
) ENGINE=MyISAM;

CREATE TABLE `log_innodb` (
    `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
    `d1` varchar(3) NOT NULL,
    `d2` varchar(3) NOT NULL,
    `d3` varchar(3) NOT NULL,
    PRIMARY KEY (`id`)
) ENGINE=InnoDB;

CREATE TABLE `log_memory` (
    `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
    `d1` varchar(3) NOT NULL,
    `d2` varchar(3) NOT NULL,
    `d3` varchar(3) NOT NULL,
    PRIMARY KEY (`id`)
) ENGINE=Memory;

My test script now simply enters 1,000,000 flat lines into the tables. Afterwards calculate an average out of consumed time. Here's the benchmark script:

foreach( array("myisam", "innodb", "memory") as $table ) {
    $query = "INSERT INTO log_{$table} SET d1='foo',d2='bar',d3='baz'";

    $start = microtime(true);
    for( $i = 0; $i < 1000000; $i++ ) {
        $sql->query($query);
    }
    $end = microtime(true);

    $results[$table] = $end - $start;
}

As expected, the Memory table was by far the fastest one. But even the MyISAM is every time faster than InnoDB. This makes sense to me, because MyISAM leaks support for things like foreign keys and transactions, so there is less functional overhead in this engine.

What really surprised me is the fact, that the Memory table is nearly as twice the size as the other tables. At this point I'm not sure why. The results:

                  | InnoDB    | MyISAM    | Memory    |
|-----------------|-----------|-----------|-----------|
| time for insert | 133.92 s  | 101.00 s  | 79.351 s  |
| avg. per entry  | 0.1392 ms | 0.1010 ms | 0.0794 ms |
| time saved in % | 0.0000 %  | 24.585 %  | 21.436 %  |
| table size      | 35.6   mb | 29.9   mb | 55.9   mb |

But as far as I know, MyISAM locks the table while executing an INSERT. This could be problematic on many simultaneous requests. But I don't know how to benchmark this.

Another question for me is, how the index of the id column will affect the run time. Is it helpful or will it slow down the time. So I let the benchmark script run again, after I removed the PRIMARY index and the AUTO_INCREMENT option from the id column.

                  | InnoDB    | MyISAM    | Memory    |
|-----------------|-----------|-----------|-----------|
| time with id    | 133.92 s  | 101.00 s  | 79.351 s  |
| avg. with id    | 0.1392 ms | 0.1010 ms | 0.0794 ms |
|-----------------|-----------|-----------|-----------|
| time without id | 131.88 s  | 91.868 s  | 73.014 s  |
| avg. without id | 0.1319 ms | 0.0919 ms | 0.0701 ms |

MyISAM seems to take the most advantage of dropping the index. But the range of the two results are not as width spreaded as expected.


Query thoughts:

The query itself has been kept simple. I would not know how to improve this any further.

INSERT INTO log_myisam
SET
    d1 = 'foo',
    d2 = 'bar',
    d3 = 'baz'

PHP script thoughts:

One thing that would cost time, is the connection itself. Because of that, I would go with a persistent connection. I've used mysqli of course. But is there a difference between procedural or oop usage? I've made a simple benchmark again.

$startProcedual = microtime(true);
for( $i = 0; $i < 1000; $i++ ) {
    $sql = mysqli_connect('localhost', 'root', '', 'benchmark');
    mysqli_close($sql);
    unset($sql);
}
$endProcedual = microtime(true);

$startOop = microtime(true);
for( $i = 0; $i < 1000; $i++ ) {
    $sql = new mysqli('localhost', 'root', '', 'benchmark');
    $sql->close();
    unset($sql);
}
$endOop = microtime(true);

Without a persistent connection the difference is quite visible! The oop style is a bit faster, and this are only 1.000 connections.

procedural: 0.326150 s
oop:        0.256580 s

With persistent connection enabled, both versions are nearly the same. But the whole connection time dropped by one third of the normal one. So it seems the best way is to go with a persistent connection here.

procedural: 0.093201 s
oop:        0.092088 s

My temporary conclusion:

Actually I'm at a logging time of 0.204 ms (as an avg. of 100.000 inserts).

As for now I would say the following:

But there are some open questions for me. Did I made the right decisions? Is MyISAM blocking the execution? Is there a way to use Memory? Could a persistent connection could have any side effects, like slowing down through higher memory usage? ...

I would really appreciate your ideas or tips for an even faster logging. Maybe I'm totally wrong at some points. Please let me know if you have any experiences with this kind of things.

Thanks in advance!

Upvotes: 0

Views: 811

Answers (1)

Drew
Drew

Reputation: 24959

Regardless of the Engine, and I am glad not to touch that hot potato, the fastest way by far will be to use LOAD DATA INFILE from CSV files that you create. So, the CSV is created in append mode as the traffic comes in. Have a mechanism to close one version down, get a new incrementor, and start fresh. Perhaps hourly. Your files may look like this when done

/tmp/csvfiles/traffic_20160725_00.csv
...
/tmp/csvfiles/traffic_20160725_23.csv

That just got you 24 hours of traffic. Upload as described above, whenever you feel like it, rest assured of two things:

  • You always have a backup
  • You will always outperform an INSERT call (by factors that will blow your mind away).

An added bonus is that your csv, let's just call them text files, are pretty much ready to rock and roll into a no sql solution when you decide that is where it might belong anyway.

Note: I am a big fan of Events. I have 3 links off my profile page for them as examples. However, Events and Stored Procs are forbidden to use LOAD DATA INFILE. So I have separate agents that do that. Those agents are nice because over time I naturally add different functionality to them. With the combo of Events and agents, there is never a need to use cron or other o/s schedulers.

The Takeway: Never use INSERT for this.

Upvotes: 2

Related Questions