user849137
user849137

Reputation:

Caching data with class properties - why is it a bad idea?

I've recently been reading many articles on scalability for PHP apps. Nearly all of the articles I have read have mentioned caching, so I came up with this idea of caching DB data in class properties, to prevent excess DB queries. I wanted to share the idea so I blogged about it, only to have my teacher tell me it was pointless and silly. Apart from using the words pointless and silly, he couldn't really explain why it was bad. Could someone here please explain why this method of caching, to help scale PHP applications is bad?

The method:

Theory:

Instead of fetching the data in every method (where needed) from the DB, executing query after query, I thought it would be a good idea to have a class property (variable) that stores the fetched DB data, to prevent the need of duplicate queries or queries that will return the same data.

If you didn't get that, here's an example taken from my blog:


I'm going to bring Facebook into this example, just to ease the explanation a bit. Let's say we were re-coding the user class for the social network.

class FBuser
{

}

The obvious methods this class would contain:

getStatusUpdates()
getAccountInfo()
getFriendIDs()

Originally, those methods would each have to execute database queries, to get the required data. But with the caching method, I would define a class property to store the cached data, and would have all the DB querying going on in a single method:

class FBuser
{
    private $userCache = array();

    private function getData( $dataToGet = '' )
    {

    //all of my db querying would happen here

    }

}

But in that same method, I would also be looking for cache, if I'm allowed to do so:

private function getData( $dataToGet = '' , $useCache = true )
{
   //am I allowed to use cache?
   if ( $useCache === true )
   { 
       //does the appropriate data exist in cache?
       if ( isset($this->userCache[ $dataToGet ]) )
       {
           return $this->userCache[ $dataToGet ];//return the cached data, and forget about the DB queries
       }

   }

   //if we get here, caching was disabled or the required data has not yet been cached :(
   //all of my db querying would happen here

   //store the data that's just been fetched by the queries in the cache property

}

This way, I could call getData( 'the data I want' , true ); whenever I want to fetch data from the DB, allowing me to use the cached data where and when possible.

So if I ever needed to call getAccountInfo(), getStatusUpdates() or getFriendIDs() multiple times, this method would prevent multiple DB queries being executed = good for scaling (I would think).


Upvotes: 3

Views: 1538

Answers (2)

Mahn
Mahn

Reputation: 16585

Why is it a bad idea?

Strictly speaking it is not a bad idea per se, in that it will do you what you expect it to do and there is a little bit of performance to gain if you have duplicate queries in your script.

In practice, however, unless your script is doing something very, well, unusual, the number of database calls per request of your typical PHP script won't exceed 15 or 20, and out of those perhaps only 2 o 3 are duplicate tops. If the database calls were already relatively fast, haxing 2 or 3 database calls would have an negligible difference in performance. Not to mention the database itself may already have caching systems in place!

Implementing a persistent cache, one that lives between requests, is where the potential performance jackpot is, depending on your app/scripts.

I'm not saying "don't do it", I'm just saying that unless you plan to run the same query hundreds of times within the same request/script, which is generally unlikely, you will not see much out of it without a persistent solution; but it will absolutely not hurt.

Upvotes: 3

AdamJonR
AdamJonR

Reputation: 4713

Your teacher is silly :p

The main point I'd make is that this type of caching, depending on the context, can in fact be very helpful. I do this some in the web framework I've developed, and this refactoring has been driven by careful analysis of cachegrinds using XDebug.

Think about it this way. Your DB access is some of the most expensive (in terms of performance) work your PHP script will perform. It's easy to find pages where the DB-related calls are responsible for 50% (or more) of the total execution time of the page. Why not cache the results so any reuse of the data is automatically benefitting?

There's no reason not to in terms of PHP resource allocation, as under the covers, PHP will share the references to the zvals unless they're modified, so you're script won't require more memory on the heap, either.

For those who doubt this, I'd challenge them to run XDebug on a page that makes one DB call instead of two and to declare to the world that they can't see a significant result. When the code to implement this is so simple, why not make the improvement?

Now, some may point to more persistent forms of caching and say you should use them INSTEAD of this. I disagree with the universality implied by this type of response. Perhaps that data set is too large to cache on the server in entirety. For example, I'm not going to cache every persons data in memory when only 1% of the users login each day. It's not worth the memory on the server. Perhaps the data is updated frequently, in which case synchronizing becomes an issue/burden that can outweigh the benefits of caching. The point I'd make is that there are situations where more persistent forms of caching are not as appropriate.

Be green, every cycle counts :)

Upvotes: 1

Related Questions