ashwinsakthi
ashwinsakthi

Reputation: 1956

Are DB hits costlier than accessing collection in java?

Just implemented a design where i had cached some data in hashmap and retrieved data from it instead querying the same data from DB.

Is my thinking correct ?

Upvotes: 4

Views: 375

Answers (7)

parsifal
parsifal

Reputation: 1663

You can answer this yourself if you think through what happens when you talk to the database:

  1. Your program has to send the query to the database. Depending on whether the database server is running in-process or somewhere else on the network, this may take anywhere from a few microseconds to a few milliseconds.
  2. The database server has to parse your query and generate an execution plan. Depending on the server, it might cache an execution plan for frequently executed queries. If not, plan on another few microseconds to generate the plan.
  3. The database server has to execute your plan, reading whatever disk blocks are needed to access the data. Each disk access will take tens of milliseconds. Depending on how large the table is, and how well it is indexed, your query might take seconds.
  4. The database server has to package up the data and send it back to the application. Again, depending on whether it's in-process or across the network, this will take microseconds to milliseconds, and it will vary depending on how much data is sent back.
  5. Your application must convert the retrieved data into a useful form. This is probably a microsecond or less.

By comparison, a lookup on a hashed data structure requires a few memory accesses, which may take a few nanoseconds each. The difference is several orders of magnitude.

Upvotes: 3

Christophe Roussy
Christophe Roussy

Reputation: 16999

Should be much faster if the cost of computing the hash code is low, it also depends on the number of entries (as there will be more collisions)

Upvotes: 0

Marko Topolnik
Marko Topolnik

Reputation: 200148

The primary concern to take into account is the size of your cache: after a certain threshold, you are making more damage than good. For example, if the cache has a million entries, and each entry is 1 KB (not so hard to reach, given the overhead of each object), you have occupied a full gigabyte of heap. The performance of major GC will also be terrible in that case.

Upvotes: 2

Isaac
Isaac

Reputation: 16736

Look at it this way: to inquire the database, bytes have to be copied into memory anyway. Therefore, just accessing memory will always be faster than hitting a database.

Upvotes: 0

Pradeep Simha
Pradeep Simha

Reputation: 18123

Always hitting DB costlier than anything you do at code level..

Upvotes: 0

Bohemian
Bohemian

Reputation: 424983

Hitting a Collection is going to be several orders of magnitude faster than hitting a DB, especially one on another server (due to communication lag thereto)

That said:

  • Databases can cache data themselves, so this optimization may not be necessary
  • If the data is very large, you will have to deal with memory consumption
  • Updates to data must be dealt with, for example by invalidating the cache

Upvotes: 4

NPE
NPE

Reputation: 500217

Keeping a copy of the data in memory would almost certainly be faster than fetching it from the DB.

That said, there are further considerations to be taken into account:

  1. Can the data in the DB change while you're holding an in-memory copy? If so, how are you going to deal with that?
  2. Is memory consumption going to be an issue?
  3. Are you certain that you're optimizing a real, and not an imagined, bottleneck?

Upvotes: 7

Related Questions