Amit Gupta
Amit Gupta

Reputation: 471

How to keep a very large dictionary loaded in memory in python?

I have a very large dictionary of size ~ 200 GB which I need to query very often for my algorithm. To get quick results, I want to put it in memory which is possible, because fortunately I have a 500GB RAM.

However, my main issue is that I want to load it only once in memory and then let other processes query the same dictionary, rather than having to load it again everytime I create a new process or iterate over my code.

So, I would like something like this:

Script 1:

 # Load dictionary in memory
 def load(data_dir):
     dictionary = load_from_dir(data_dir) ... 

Script 2:

 # Connect to loaded dictionary (already put in memory by script 1)
 def use_dictionary(my_query):
     query_loaded_dictionary(my_query) 

What's the best way to achieve this ? I have considered a rest API, but I wonder if going over a REST request will erode all the speed I gained by putting the dictionary in memory in the first place.

Any suggestions ?

Upvotes: 0

Views: 1483

Answers (1)

JulienD
JulienD

Reputation: 7293

Either run a separate service that you access with a REST API like you mentioned, or use an in-memory database.

I had a very good experience with Redis personally, but there are many others (Memcached is also popular). Redis was easy to use with Python and Django.

In both solutions there can be data serialization though, so some performance will be dropped. There is a way to fill Redis with simple structures such as lists, but I haven't tried. I packed my numeric arrays and serialized them (with numpy), it was fast enough in the end. If you use simple string key-value pairs anyway, then the performance will be optimal, and maybe better with memcached.

Upvotes: 1

Related Questions