Alan
Alan

Reputation:

Log to file via PHP or log to MySQL database - which is quicker?

I have a database driven website serving about 50,000 pages.

I want to track each webpage/record hit. I will do this by creating logs, and then batch processing the logs once a day. I am not concerned with how I will do the batch process, only with the quickest way to log.

How would you log, which do you think is quicker:

a) Use PHP to append to the end of a text log file.

b) Use MySQL to INSERT INTO a non-indexed log table.

Upvotes: 11

Views: 16353

Answers (15)

alexandrul
alexandrul

Reputation: 13246

You could try both ways using log4php, which supports:

  • Configuration through xml and properties file (same structure as log4j).
  • File, RollingFile, DailyFile, Echo, Console, Mail, PEAR::Db, PHP error, Syslog or NT events and Socket appenders.
  • Simple, TTCC, Pattern, Html and Xml Layouts.
  • Nested (NDC) and Mapped (MDC) Diagnostic Contexts.
  • Switchable internal debug.

Regarding logging into a file, you could improve performance by buffering the write requests.

Upvotes: 7

Andrew Moore
Andrew Moore

Reputation: 95334

If you are using either file based logging or database based logging, your biggest performance hit will be file/table locking. Basically, if client A and client B connects within a relatively small time frame, client B is stuck waiting for the lock to be released on the hits file/table before continuing.

The problem with a file based mechanism is that file locking is essential to ensure that your hits doesn't get corrupted. The only way around that is to implement a queue to do a delayed write to the file.

With database logging, you can at least do the following [MySQL using MyISAM]:

INSERT DELAYED INTO `hits` ...

See 12.2.5.2. INSERT DELAYED Syntax for more information.

Upvotes: 1

endian
endian

Reputation: 4294

I read an article in the C++ Users Journal, years ago, about loggin performance. Whether you use DB or files, the best thing to do is write unformatted data that can be "inflated" into meaningful data when (and more likely if) you need to view the logs. The vast majority of the cost of logging is informatting the strings that are written to the destination, and most of the time that cost is wasted - the logs are never read.

I can dig out the article reference if it's useful to you.

Upvotes: 2

fireweasel
fireweasel

Reputation: 349

You should try SQLite. It will give you both the speed of writing to a file as well as the power of a database.

Upvotes: 2

Richard Harrison
Richard Harrison

Reputation: 19393

Use a database - it is the only sane option. Even if it takes a little longer. Once you start with logfiles then you are on a track where it will cause you pain - e.g. moving servers, file permissions, precludes load balancing etc...

If you've got the database open then I reckon that it would be probably quicker to insert a single row.

However with all this performance related the only way to be sure is to write a simple test and measure it....

Update: I've done a quick test - and sure enough if you have to open and close the file it's about the same speed or slower using a test of 10,000 lines:

However when you start to have multiple processes doing this it slows down as can be seen below. This is with 10 concurrent processes (all timings in seconds)

DB time: 2.1695
DB time: 2.3869
DB time: 2.4305
DB time: 2.5864
DB time: 2.7465
DB time: 3.0182
DB time: 3.1451
DB time: 3.3298
DB time: 3.4483
DB time: 3.7812
File open time: 0.1538
File open time: 0.5478
File open time: 0.7252
File open time: 3.0453
File open time: 4.2661
File open time: 4.4247
File open time: 4.5484
File open time: 4.6319
File open time: 4.6501
File open time: 4.6646
Open close file time: 11.3647
Open close file time: 12.2849
Open close file time: 18.4093
Open close file time: 18.4202
Open close file time: 21.2621
Open close file time: 22.7267
Open close file time: 23.4597
Open close file time: 25.6293
Open close file time: 26.1119
Open close file time: 29.1471

function debug($d)
{
    static $start_time = NULL;
    static $start_code_line = 0;

    if( $start_time === NULL )
    {
        $start_time = time() + microtime();
        $start_code_line = $code_line;
        return 0;
    }

    printf("$d time: %.4f\n", (time() + microtime() - $start_time));
    $fp = @fopen('dbg.txt','a');
    fprintf($fp,"$d time: %.4f\n", (time() + microtime() - $start_time));
    fclose($fp);

    $start_time = time() + microtime();
    $start_code_line = $code_line;
}

function tfile()
{
    $fp = @fopen('t1.txt','a');
    for ($i=0;$i<10000;$i++)
    {
        $txt = $i."How would you log, which do you think is quicker:How would you log, which do you think is quicker:";
        fwrite($fp,$txt);
    }
    fclose($fp);
}
function tfile_openclose()
{
    for ($i=0;$i<10000;$i++)
    {
        $fp = @fopen('t1.txt','a');
        $txt = $i."How would you log, which do you think is quicker:How would you log, which do you think is quicker:";
        fwrite($fp,$txt);
        fclose($fp);
    }
}

function tdb()
{
    $db = mysql_connect('localhost','tremweb','zzxxcc');

    $select_db = mysql_select_db('scratch');

    if (!$select_db) 
        die('Error selecting database.');

    for ($i=0;$i<10000;$i++)
    {
        $txt = $i."How would you log, which do you think is quicker:How would you log, which do you think is quicker:";
        mysql_query("INSERT INTO tlog values('".$txt."')");
    }
}

debug("");

tfile();
debug("File open");

tfile_openclose();
debug("Open close file");

tdb();
debug("DB");

Upvotes: 4

Ian
Ian

Reputation: 4258

If this is for a database driven site, why aren't you just using the built in logging capabilities of Apache or IIS, and a suitable reporting tool such as AWStats and beyond that, there's always Google Analytics

AWStats and webserver logging is my preference - you essentially get it for free anyway - even if you're not after traffic analysis, you could still consider parsing the Apache access log file yourself for whatever batch processing you need to do.

Upvotes: 0

Gary Richardson
Gary Richardson

Reputation: 16441

I've done something similar. I log each record to a separate file, then I have a batch process that grabs the files, puts them into a tar file and uploads them to the central log server (in my case, S3 :)).

I generate random file names for each log entry. I do this to avoid locking files for rotation. It's really easy to archive/delete this way.

I use json as my log format instead of the typical white space delimited log files. This makes it easier to parse and add fields in the future. It also means it's easier for me to write an entry per file than appending multiple records per file.

I've also used log4php+syslog-ng to centralize logging in real time. I have log4php log to syslog, which then forwards to the logs to my central server. This is really useful on larger clusters. One caveat is that there's a length limit to syslog messages, so you risk longer messages being truncated.

Upvotes: 1

Gaurav
Gaurav

Reputation: 486

As others mentioned - it depends on lots of things such as traffic, disk speed, etc. You'll have to test both the scenarios.

While testing MySQL, try both MyISAM and INNODB. In theory, Innodb will perform better as it has row level locking.

Upvotes: 0

Cory House
Cory House

Reputation: 15045

A few considerations:

  1. Do you think you'll want to join log data with other data in the database? If so, the overhead of a db insert is likely justified so existing relationships can be easily leveraged.
  2. Would logging the data in the database allow you to reduce the amount of data you're logging greatly (due to existing relationships in the db)? For example, a log in the database of user activity could simply be a table containing a userid, activityid, and a timestamp. A log file this lean in a file wouldn't be human readable. Depending on your needs, you'd need to capture at least some of the user's data in the log file to assure it can be useful and human readable on its own.
  3. Any chance you'll want to leverage this log data in the front end or via an admin tool down the road? If so, DB write is likely preferable.

Upvotes: 0

SquareCog
SquareCog

Reputation: 19666

  1. Write to file. Rotate logs.

  2. Batch load the file to the database on a scheduled basis.

There are many, many reasons to choose this architecture -- ease of scaling (write to many logs, load them to db), lack of reliance on a SPOF in the database (if something goes wrong, you just accumulate logs for a while), ability to do cleaning and non-trivial parsing at load-time without burdening your production servers, and more.

Upvotes: 17

willasaywhat
willasaywhat

Reputation: 2372

All depends on your infrastructure and limitations. If the disk is slow, writing will be slow. If the SQL server is lagged by the requests, the insert will be slow. Flat file is probably the best way to go, but I would write your code or use existing code (PEAR::Log) so you can change the provider and storage method at will.

Upvotes: 0

Eric Hogue
Eric Hogue

Reputation: 8920

I would use a Delayed Insert into MySQL. This way you don't have to wait for the insert to finish.

Upvotes: 6

david
david

Reputation:

I'd recommend you test both with a few test cases.

I would assume a flat file would be faster, b/c that's really what the DB is doing - it's just writing it to a file. The only advantage I can think of is if the database can run concurrently, you might get better results.

Upvotes: 1

vaske
vaske

Reputation: 9542

Into file will be quicker, but into DB will be better.

Upvotes: 0

Robert Rouse
Robert Rouse

Reputation: 4851

I would believe that a flat file will be faster to write to.

Upvotes: 2

Related Questions