fsw
fsw

Reputation: 3695

What is the fastest cache for small and rarely changing data to use with Django?

I am in a process of switching from PHP to Python + Django and looking for equivalent of PHP's "Array cache".

For small data sets from DB like "categories" that was changing very rarely but accessed very often i was using array cache.

http://www.mysqlperformanceblog.com/2006/08/09/cache-performance-comparison/

Concept of it was to generate PHP source with the tree of categories and when the opcode was turned on it was working like embedding data into application sources. It was the fastest imaginable cache, very helpful for large load.

Django manual(https://docs.djangoproject.com/en/1.4/topics/cache/) states:

By far the fastest, most efficient type of cache available to Django, Memcached..

So the questions are:

EDIT:

As pointed in an answer i can use repr() and this can be benchmarked easily so i have created a simple benchmark:

https://github.com/fsw/pythonCachesBenchmark

output of this on my local machine was:

FIRST RUN
get_categories_from_db
6.57282209396
get_categories_from_memcached
(SET CACHE IN 0.000940)
4.88948512077
get_categories_from_pickledfile
(SET CACHE IN 0.000917)
2.87856888771
get_categories_from_pythonsrc
(SET CACHE IN 0.000489)
0.0930788516998
SECOND RUN
get_categories_from_db
6.63035202026
get_categories_from_memcached
4.60877108574
get_categories_from_pickledfile
2.87137699127
get_categories_from_pythonsrc
0.0903170108795

get_categories_from_pythonsrc is simple implementation of PHP's arraycache i was talking about:

def get_categories_from_pythonsrc():
    if not os.path.exists('catcache.py'):
        start = time.time()
        f = open( 'catcache.py', 'wb' )
        categories = get_categories_from_db()
        f.write('x = ' + repr(categories))
        f.close()
        print '(SET CACHE IN %f)' % (time.time() - start)
    import catcache
    return catcache.x

this is my simple pickledfile cache implementation:

def get_categories_from_pickledfile():
    path = 'catcache.p'
    if not os.path.exists(path):
        start = time.time()
        pickle.dump( get_categories_from_db(), open( path, 'wb' ) )
        print '(SET CACHE IN %f)' % (time.time() - start)
    return pickle.load(open( path, 'rb' ));

complete source:

https://github.com/fsw/pythonCachesBenchmark/blob/master/test.py

I will later add "Django's low-level cache APIs" to this benchmark to see what they are about.

So as my intuition suggested caching dictionary in a python .py file is the fastest way i could get (over 30 times faster than cPickle + file)

As said i am new to Python so probably i am missing something here?

If not: why isn't this solution widely used?

Upvotes: 2

Views: 470

Answers (2)

grizwako
grizwako

Reputation: 1563

There is one other approach. You could use some ASYNC server like gevent and have live objects in some global namespace.
I do not know how familiar you are with such workflow, it is different from apache/php "each request starts bare".

Basically, you load your application, and use it to serve requests. It is alive all time and is sleeping if there are no requests. Once you load "categories" from database, store them in global variable or some module.

Let's say that you launch WSGI instance and give it name app. Afterwards, you can just have dictionary in that app and store cache there. So no serialization, network protocols, all data is directly available in RAM.

EDIT1: DO NOT USE globals often, this is just one of very rare cases where it is OK to store something in global namespace (in my opinion).

Upvotes: 1

Tadeck
Tadeck

Reputation: 137300

Python has several solutions that may work here:

  • Memcached (as you already know),
  • pickle (as Blender mentioned) - which of course can be used with eg. Memcached,
  • several other caching (eg. for local memory) & serialization (eg. simplejson) solutions,

In general pickle is very fast (use cPickle if you need more speed) and in Python you do not need anything like var_export() (although you can use repr() on variables to have their valid literal, if they are of one of primitive types). pickle in Python is more similar to serialize in PHP.

Your question is not very specific, but the above should give you some insight. Also you need to take into account that PHP and Python have different philosophies, so solutions to the same problems may look differently. In this specific case pickle module should solve your issues.

Upvotes: 1

Related Questions