Rony
Rony

Reputation: 101

Speed a search cache without using too much memory

I have to access a database with 380,000 entries. I don't have write access to the DB, I can just read it. I've made a search function using a map to search for users by firstname. Here is my process: 1 - Load everything from the DB 2 - Store everything into a Map<Charactere, ArrayList<User>>, using Alpha letters to store users according to the first letter of their firstname.

<A> {Alba, jessica, Alliah jane, etc ...}
<B> {Birsmben bani, etc ...}

When someone searches for a user, I take the firstletter of the firstname typed and use map.get(firstletter), then iterate on the ArrayList to find all the users.

The Map Take a huge space in the memory I guess (380,000 User object). I had to increase the heap size I want to make it faster. Use firstname as key for the Map, in order to make it faster (there are many people with the same firstname).

I have two solutions in mind:

1 - Still use a map with firstname as key (increasing the heap size again?)
2 - Use files on the disk instead of Map (Alba.dat will contain all Alba for example) and open the right file for each search. No need to incease the heap size, but are there any side effects?

Which one is better? (pros and cons)

Update with more info

It's a database of customers who calls our customer service on the phone. The person who takes the call has to search using the customers names (usually firstname and then lastname). Using the Db is too slow to search. The solution I've implemented is much faster already (1/2 seconds vs 26 seconds using the db), but I want to improve it.

Upvotes: 3

Views: 142

Answers (2)

cruftex
cruftex

Reputation: 5723

There are several issues with your approach:

  • It implies that the number in users doesn't change, a good application design would work with any number of users without software change
  • It implies that the current problem is the only one. What happens if the next requirement that needs implementation is "search by caller id" or "search by zip code"?
  • It is reinventing the wheel, you are currently starting to write a database, index or information retrieval solution (however you want to name it) from scratch

The right thing to do is to export the user data into a database engine which provides proper search capabilities. The export/extraction hopefully can be speed up, if you have modification time stamps or if you can intercept updates and reapply it to your search index.

What you use for your search does not matter to much, a simple database on a modern system is fast enough. Most also provide indexing capabilities to speed up your search. If you want something which can be embedded in your application and is specialized on search and solves your problems above, I'd recommend using Lucene.

Upvotes: 0

Duong Nguyen
Duong Nguyen

Reputation: 850

IMHO, I don't think you have to cache all the entries in memory, but a part of them, maybe:

  • Maybe just use a ring buffer, or
  • More complicated, and make more sense, to implement a LFU Cache, that keeps the N top most frequently accessed item only. See this question for a hint of how to implement such a cache.

Upvotes: 2

Related Questions