Ehcache - why are the entries so big?

Question

I have a fairly simple data model like:

class MyParent {
     // 7 fields here, some numeric, some String, not longer than 50 chars total
     Set children;
}

class MyChild {
    int ownerId;
    // 3 more fields, numeric or dates
}

MyParent, MyChild and MyParent.children are all cached with read-only.

I have 40,000 instances of MyParent and 100,000 instances of MyChild. That yields 180,000 entries in cache (if you add 40,000 MyParent.children).

I want to cache everything, grouped by ownerId. Not wanting to reinvent the wheel, I wanted to use query cache like:

Query query = session
                .createQuery(
                        "select distinct p from MyParent p join fetch p.children c where c.ownerId = :ownerId");
query.setParameter("ownerId", ownerId);
query.setCacheable(true);
query.setCacheRegion("MyRegion");
query.list();

For all 1,500 values of ownerId.

Cache works, but I noticed it's huge! Measured with Ehcache.calculateInMemorySize(), on average each entry is over one kilobyte big. In order to cache ~180,000 entries I would need over 200 MB. That's outragous, given that the entries themselves are much smaller.

Where does the overhead come from and how can I decrease it?

Alex Snaps · Accepted Answer

I'm not sure from the question what cache you used to do the math, but let me use the MyParent class as an example. Given what you explained about the class, on a 64bit VM with compressedOops enabled, a MyParent instance would be a little below 500 bytes in heap. And that is without the Set, I'll explain why later (it'd be another 128 bytes on top otherwise). The cache also needs to hold the key for that entry, which comes added to the calculation...

Hibernate doesn't directly use the primary key the key to something it stores in the cache, but a CacheKey entry. That instance holds the pk of the entity the value represents as well as four other fields: type, the Hibernate type mapping; entityOrRoleName, the entity or collection-role name; tenantId, the tenant identifier associated this data; and finally, the hashCode of the pk (see org.hibernate.type.Type.getHashCode).

Now sadly it all doesn't end here, the value for that entry isn't the MyParent instance, but a CacheEntry instance. This time, besides more metadata (subClass, the entity's name, which defaults to FQCN; lazyPropertiesAreUnfetched, a boolean; and the optimisitc locking value out of the entity), that instance still doesn't hold the MyParent instance, but a disassembled representation of it. This representation is an array of the state (all properties) of the entity.

I guess that with this information, the "estimated" sizes of your hibernate caches will make more sense. I'd like to stress out that these are only estimations, and if I remember correctly how it is being calculated, it probably is slightly above reality. Indeed some information in the CacheKey for instance probably should be accounted for differently. As of Ehcache 2.5, you will be able to enable memory based tuning on Caches (and even at the CacheManager level). When that is being done, cache entries are precisely measured and the calculateInMemorySize() will give you the real measured size of the cache.

You can download the beta for 2.5 now from the ehcache.org. Also note that when using byte-based sizing on your caches, the sizing engine will account for these shared instances across cached entries in Hibernate's cache types. You can read more on the way this all works here : http://ehcache.org/documentation/configuration.html#Memory_Based_Cache_Sizing_Ehcache_2.5_and_higher

Hope that helps you make more sense out of it all... Alex

Ehcache - why are the entries so big?

Answers (1)

Related Questions