Reputation: 178630
My application includes a client, web tier (load balanced), application tier (load balanced), and database tier. The web tier exposes services to clients, and forwards calls onto the application tier. The application tier then executes queries against the database (using NHibernate) and returns the results.
Data is mostly read, but writes occur fairly frequently, particularly as new data enters the system. Much more often than not, data is aggregated and those aggregations are returned to the client - not the original data.
Typically, users will be interested in the aggregation of recent data - say, from the past week. Thus, to me it makes sense to introduce a cache that includes all data from the past 7 days. I cannot just cache entities as and when they are loaded because I need to aggregate over a range of entities, and that range is dictated by the client, along with other complications, such as filters. I need to know whether - for a given range of time - all data within that range is in the cache or not.
In my ideal fantasy world, my services would not have to change at all:
public AggregationResults DoIt(DateTime starting, DateTime ending, Filter filter)
{
// execute HQL/criteria call and have it automatically use the cache where possible
}
There would be a separate filtering layer that would hook into NHibernate and intelligently and transparently determine whether the HQL/criteria query could be executed against the cache or not, and would only go to the database if necessary. If all the data was in the cache, it would query the cached data itself, kind of like an in-memory database.
However, on first inspection, NHibernate's second level cache mechanism does not seem appropriate for my needs. What I'd like to be able to do is:
Now, I'm pretty sure #3 is asking too much. Even if I can get the cache populated with all the data required, NHibernate has no idea how to efficiently query that data. It would literally have to loop over all entities in order to discriminate which are relevant to the query (which might be fine, to be honest). Also, it would require an implementation of NHibernate's query engine that executed against objects rather than a database. But I can dream, right?
Assuming #3 is asking too much, I would require some logic in my services like this:
public AggregationResults DoIt(DateTime starting, DateTime ending, Filter filter)
{
if (CanBeServicedFromCache(starting, ending, filter))
{
// execute some LINQ to object code or whatever to determine the aggregation results
}
else
{
// execute HQL/criteria call to determine the aggregation results
}
}
This isn't ideal because each service must be cache-aware, and must duplicate the aggregation logic: once for querying the database via NHibernate, and once for querying the cache.
That said, it would be nice if I could at least store the relevant data in NHibernate's second level cache. Doing so would allow other services (that don't do aggregation) to transparently benefit from the cache. It would also ensure that I'm not doubling up on cached entities (once in the second level cache, and once in my own separate cache) if I ever decide the second level cache is required elsewhere in the system.
I suspect if I can get a hold of the implementation of ICache
at runtime, all I would need to do is call the Put()
method to stick my data into the cache. But this might be treading on dangerous ground...
Can anyone provide any insight as to whether any of my requirements can be met by NHibernate's second level cache mechanism? Or should I just roll my own solution and forgo NHibernate's second level cache altogether?
Thanks
PS. I've already considered a cube to do the aggregation calculations much more quickly, but that still leaves me with the database as the bottleneck. I may well use a cube in addition to the cache, but the lack of a cache is my primary concern right now.
Upvotes: 0
Views: 1328
Reputation: 4397
Stop using your transactional ( OLTP ) datasource for analytical ( OLAP ) queries and the problem goes away.
When a domain significant event occurs (eg a new entity enters the system or is updated), fire an event ( a la domain events ). Wire up a handler for the event that takes the details of the created or updated entity and stores the data in a denormalised reporting store specifically designed to allow reporting of the aggregates you desire ( most likely push the data into a star schema ). Now your reporting is simply the querying of aggregates ( which may even be precalculated ) along predefined axes requiring nothing more than a simple select and a few joins. Querying can be carried out using something like L2SQL or even simple parameterised queries and datareaders.
Performance gains should be significant as you can optimise the read side for fast lookups across many criteria while optimising the write side for fast lookups by id and reduced index load on write.
Additional performance and scalability is also gained as once you have migrated to this approach, you can then physically separate your read and write stores such that you can run n read stores for every write store thereby allowing your solution to scale out to meet increased read demands while write demands increase at a lower rate.
Upvotes: 2
Reputation: 2666
When analyzing the NHibernate cache details i remember reading something that you should not relay on the cache being there, witch seems a good suggestion.
Instead of trying to make your O/R Mapper cover your applications needs i think rolling your own data/cache management strategy might be more reasonable.
Also the 7 days caching rule you talk about sounds like something business related, witch is something the O/R mapper should not know about.
In conclusion make your app work without any caching at all, than use a profiler (or more - .net,sql,nhibernate profiler ) to see where the bottlenecks are and start improving the "red" parts by eventually adding caching or any other optimizations.
PS: about caching in general - in my experience one caching point is fine, two caches is in the gray zone and you should have a strong reason for the separation and more than two is asking for trouble.
hope it helps
Upvotes: 0
Reputation: 25946
Define 2 cache regions "aggregation" and "aggregation.today" with a large expiry time. Use these for your aggregation queries for previous days and today respectively.
In DoIt()
, make 1 NH query per day in the requested range using cacheable queries. Combine the query results in C#.
Prime the cache with a background process which calls DoIt()
periodically with the date range that you need to be cached. The frequency of this process must be lower than the expiry time of the aggregation cache regions.
When today's data changes, clear cache region "aggregation.today". If you want to reload this cache region quickly, either do so immediately or have another more frequent background process which calls DoIt()
for today.
When you have query caching enabled, NHibernate will pull the results from cache if possible. This is based on the query and parameters values.
Upvotes: 1