gil
gil

Reputation: 2552

cache managing in search engine

i wonder what's the ultimate solution for a cache managing. let's say i have a single server and unbounded number of clients connected to it. the clients keeps sending search requests to the server (lets call the search request of the client -'x') and the server answers to the client with some -'y'. now, in order to speed up my search engine i want to save the most popular requests(x) in the cache memory and their answers(y). (note that it's important for every same x the clients are sending to the server , they must get the same y). i also got a database which hold all the previous requests(x,y,z-number of times x requested).

i've met some issues while updating the cache:

  1. how can i tell in which index my least popular request stays inside the cache, in order to replace it if i find a more popular query inside my database(without scan the whole cache of course).

  2. how should i update the cache ? (just scan the whole database? that's pretty expensive operation)

3.how much request elements should my cache contain?

4.do you think a HashMap is an efficient data structure to maintain a cache?(i'm working with java).

5.i was thinking about updating my cache based on the last T-(some number) queries, and not update it while going through all requests on the database. because maybe there's requests which used to be very popular and they aren't popular anymore, and if there's a new popular request it can take long time until it gets inside the cache based on the number of shows(it also must be faster since i don't have to scan the whole database) . is that an legitimate way to manage a cache?

Upvotes: 1

Views: 288

Answers (2)

Slava Imeshev
Slava Imeshev

Reputation: 1390

Here are some thoughts:

  1. A typical cache stores cached values indexed by a hash code of a request. So if you know the request you should be able to invalidate the cache based on that. Or you can use a reasonable expiration period and the cache API would remove expired elements automatically.

  2. Updating the cache. If you your data is stored in a DB, the best way to do it is to use updated counters or timestamps stored in the DB. When a request comes in and the cached request has a timestamp different from the DB, it times to read it from the DB in full. Cacheonix caches SQL queries using its DataSource API.

  3. As for the size of the cache, it should be big enough to maintain healthy hit / miss ration, something around 80%. At the same time, you want to limit the byte size of the cache to avoid running out of memory.

  4. HashMaps are not that good for caching because they don't offer a meaningful level of concurrency and eviction based on size and a lot of other issues. The there are a few production-grade cache APIs and you may add Cacheonix to the list.

  5. A cache API should offer your plenty of ways to maintain the cache up to date, from LRU eviction to byte-size eviction to custom DataSources, but ultimately it depends on your business logic.

Upvotes: 0

cruftex
cruftex

Reputation: 5723

A bachelor asked the computer to find him the perfect partner.

“I want a companion who is small and attractive, loves water sports and enjoys group activities.”

The computer answered: “Mary a penguin”

(Quote from: http://www.recipeapart.com/perfect-partner/#ixzz48iEVSp1y)

If you have an unbounded number of clients, the ultimate cache solution would be to make the clients forward your data. You can do that with the internet. Sample applications that are doing that are available, e.g. bit torrent.

When you have narrowed down your requirements look at the various Open Source Java cache implementations:

  • Apache Ignite
  • Apache Java Caching System
  • Apache Geode
  • inifinispan
  • hazelcast
  • EHCache
  • Google Guava
  • Caffeine
  • cache2k

Start to use one. Read the manuals.

Read my blog at: cruftex.net

Different scenarios require different solutions.

As far as I know, none of these projects have succeeded to build the ultimate cache. As far as I know, no user is known to have the ultimate cache, by using all of the current implementations.

May be I should name my cache implementation the "ultimate cache". But then, it wouldn't exist.

Upvotes: 1

Related Questions