Rocket04
Rocket04

Reputation: 1631

What are some of the best tools/strategies to cache medium/large data sets in PHP?

I have your average PHP app (running on Windows server) with forms and data grids/lists and some of them require running pretty complex queries that I've optimized to the max, and I doubt there's a ton that can be done to make them run much faster. I also don't have the option of changing the database structure, given other processes that depend on the structure. So since caching hasn't really been used much in the app, that seems to be the next logical step.

I recently read up on generational caching and came up with a decent mechanism to automate caching of queries in my apps. My issue now is that I'm running into size limitations for both options that appeared to be logical choices. WinCache limits you to a total of 85MB, which isn't going to cut it, and memcached limits an item to 1MB, which doesn't seem like much if you have a query that returns a fairly large number of records and has a lot of fields. OK, to be exact, seems like memcached now allows you to set a larger size, but the mere fact that it's 1MB by default and used to only allow that makes me question what I'm trying to do.

The maximum number of records my grid allow to return at once is 1000 records, so that's the maximum number of records that could get stored in the cache (fields per record vary, of course). I know that a huge number of users would probably mean the cache would fill up very quickly, but the number of concurrent users is usually not huge and from what I read, if memcached runs out of memory it'll just get rid of the oldest cached items, so I don't see a big downside with storing larger data sets, unless the operation is very expensive, but from what I've read it doesn't seem to be the case.

So in summary, what I'm wondering is if it's a bad idea to store larger data sets in memcached (and granted, I know I don't want to store a query with a million records in there). And if it's a bad idea, what would be a good alternative for caching/improving performance when retrieving those data sets?

Upvotes: 10

Views: 3176

Answers (5)

Joe Watkins
Joe Watkins

Reputation: 17158

Unless you have a good reason to send that data over the wire to a cache, don't.

If at all possible, use a local, in process caching solution, such as APC(u) or YAC (YAC is extremely clever software, and may not be stable).

When APC(u), or wincache actually, copy arrays and scalars in and out of shared memory, they do so bitwise, byte by byte, they do not serialize or otherwise have to change the format of the data, this, coupled with the fact there is 0 network overhead make local caching solutions like APC(u) much faster than anything like memcached or redis.

APC(u) does not have any limitations on the size of segments or entries, you should be able to configure it to consume all the memory your operating system will allow you to map.

Of course, if you have a good reason to want to send over the wire, then this is pretty useless information ;)

Upvotes: 2

artuc
artuc

Reputation: 913

Without making big changes to your application you have some more choices to make:

  • You can cache the whole front-end. Just create a job that runs every 5 minutes (depends on the data change for your application) and create html files in html directory. And when a clients requests a url from your server serve the html file. This might be very handy.

  • Using nginx cache would be a good choice also.

  • There are some hardwares that you can add to your network which can cache really big amount of data without any trouble. Nginx is the software solution and netscaler is a hardware solution. It looks like citrix has great solutions about this.

Upvotes: 1

Abhinav
Abhinav

Reputation: 8168

There are a lot of Options you can go with:

1) If you are dealing with a lot of queries, then the use of MASTER/SLAVE DB ARCHITECTURE will be of high help.Select queries can be performed on the SLAVE DB which will reduce immense overload on MASTER DB.

2)Use of SPHINX will definitely help you in boosting your speed of data retreival. You can read about it in wikipedia article WIKI-Sphinx.

3) You can also use REDIS server which also supports Master/Slave replication. REDIS ARTICLE

It also depends on other factors, the way you normalize the table structures,indexing, performing joins.

NOTE:: Unnecessary use of JOINS are generally avoided. You can read about it in here IBM-Avoiding unnecessary outer joins

Hope this helps

Upvotes: 2

Ivan Yonkov
Ivan Yonkov

Reputation: 7034

It would be better if memcached is on different server than the database. So you will balance the load from 2 different places. Thus non-cached results will come from the database server directly, and the cached ones from the another.

The only advice I can give here is just do not get cache addicted. Because you ъan reach a point, where if your memcached server crashes for a while, all the load will got upon the database server and it will crash aswell.

Upvotes: 1

Sven
Sven

Reputation: 5265

Since your web-application runs alot of complex queries that returns a big data set, you should definitely look into caching the data.

If you have shell access to the system I suggest using a very praised and popular web application accelerator called Varnish that accelerates your pages by alot.

Upvotes: 1

Related Questions