NVG
NVG

Reputation: 3313

Database logs vs file logs

I have created a PHP+MYSQL web app and I am trying to implement now a logging system to store and track some actions of each user.

The purpose of this is the following: track the activity of each user's session by logging the IP+time+action, then see which pages he accessed later on by logging time+pagename; for each user there will be a file in the format: log{userid}_{month}.log

Each log will then be viewed only by the website owner, through a custom admin panel, and the data will be used only for security purposes (as in to show to the user if he logged in from a different IP or if someone else logged in from a different IP and to see which areas of the website the user accessed during his login session).

Currently, I have a MYSQL MyISAM table where I store the userid,IP,time,action and the app is still not launched, but we intend to have very many users (over 100k), and using a database for this solutions feels like suicide.

So what do you suggest? How should the logging be done? Using files, using a table in the current database, using a separate database? Are there any file-logging frameworks available for PHP?

How should the reading of the file be done then? Read the results by row?

Thank you

Upvotes: 16

Views: 6079

Answers (6)

Donovan Solms
Donovan Solms

Reputation: 961

You have many options, so I'll speak from my experience running a startup with about 500k users, 100k active every month, which seems to be in your range.

We logged user actions in a MySQL database.

  1. Querying your data is very easy and fast (provided good indexes)
  2. We ran on Azure, and had a dedicated MySQL (with slaves, etc) for storing all user data, including logs. Space was not an issue.
  3. Logging to MySQL can be slow, depending on everything you are logging, so we just pushed a log to Redis and had a Python app read it from Redis and insert into MySQL in the background. This made that logging basically had no impact on loading times.

We decided to log in MySQL for user actions because:

  1. We wanted to run queries on anything at any time without much effort. The structured format of the user action logs made that incredibly easy to do.
  2. It also allows you to display certain logs to users, if you would require it.
  3. When we introduced badges, we had no need to parse text logs to award badges to those who performed a specific action X number of times. We simply wrote a query against the user action logs, and the badges were awarded. So adding features based on actions was easy as well.

We did use file logging for a couple of application logs - or things we did not query on a daily basis - such as the Python app writing to the database, Webserver access and error logs, etc.

We used Logstash to process those logs. It can simply hook into a log file and stream it to your Logstash server. Logstash can also query your logs, which is pretty cool.

Advanced uses

We used Slack for team communications and integrated the Python database writing app with it, this allowed us to send critical errors to a channel (via their API) where someone could action a fix immediately.

Closing

My suggestion would be to not over think it for now, log to MySQL, query and see the stats. Make updates, rinse and repeat. You want to keep the cycle between deploy and update quick, so making decisions from a quick SQL query makes it easy.

Basically what you want to avoid is logging into a server, finding a log and grep your way through it to find something, the above achieved that.

This is what we did, it is still running like that and we have no plans to change it soon. We haven't had any issues where we could not find anything that we needed. If there is a massive burst of users and we scale to 1mil monthly active users, then we might change it.

Please note: whichever way you decide to log, if you are saving the POST data, be sure to never do that for credit card info, unless you are compliant. Or rather use Stripe's JavaScript libraries.

Upvotes: 28

diversemix
diversemix

Reputation: 569

The crux of the matter is the data you are writing is not going to be changed. In my experience in this scenario I would use either:

  • MySQL with a blackhole storage engine. Set it up right and its blisteringly fast!
  • Riak Cluster (NoSQL solution) - though this may be a learning curve for you it might be one you may need to eventually take anyway.

Upvotes: 1

Pethő Jonatán
Pethő Jonatán

Reputation: 308

Use SysLog ;) Set it up on another server and it can log all of your processes seperately (such as networking, servers, sql, apache, and your php). It can be usefull for you and decreasing the time spend of debugging. :)

Upvotes: 0

Jojo
Jojo

Reputation: 2760

We do logging with the great tool Graylog.

It scales as high as you want it, has great tools on data visualization, is incredibly fast even for complex querys and huge datasets, and the underlying search-enginge (elasticsearch) is schemaless. The latter may be an advantage as you get more possibilities on extending your logs without the hassle mysql-schemas can give you.

Graylog, elasticsearch and mongodb (which is used as to save the configuration of graylog and its webinterface) are easily deployable via tools like puppet, chef and the like.

Actually logging to graylog is easy with the already mentioned php-lib monolog.

Of curse the great disadvantage here is that you have to learn a bunch of new tools and softwares. But it is worth it in my opinion.

Upvotes: 1

delatbabel
delatbabel

Reputation: 3681

Are there any file-logging frameworks available for PHP?

There is this which is available on packagist: https://packagist.org/packages/psr/log

Note that it's not a file logging framework but an API for a logger based on the PSR-3 standard from FIG. So, if you like, it's the "standard" logger interface for PHP. You can build a logger that implements this interface or search around on packagist for other loggers that implement that interface (either file or MySQL based). There are a few other loggers on packagist (teacup, forestry) but it would be preferable to use one that sticks to the PSR standard.

Upvotes: 2

Adam
Adam

Reputation: 18807

If you are sure that reading the log will mainly target one user at a time, you should consider partioning your log table: http://dev.mysql.com/doc/refman/5.1/en/partitioning-range.html using your user_id as partitioning key.

Maximum number of partitions being 1024, you will have one partition storing 1/1000 of your 100k users, which is something reasonable.

Upvotes: 2

Related Questions