Reputation: 1785
I am calculating some very large numbers using Python, and I'd like to store previously calculated results in Berkeley DB.
The problem is that Berkeley DB has to use strings, and I have to store an integer tuple for the calculation results.
For example, I get (m, n)
as my result, one way is to store this as "%d,%d" % (m, n)
and read it out using re
. I can also store the tuple using pickle
or marshal
.
Which has the better performance?
Upvotes: 9
Views: 11899
Reputation: 2460
In python3.8
speed comparison's result may be different that what was shown in this answer.
Python 3.8.10 (default, May 4 2021, 00:00:00)
[GCC 10.2.1 20201125 (Red Hat 10.2.1-9)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>
>>> import timeit
>>>
>>> timeit.timeit("pickle.dumps([1,2,3])","import pickle",number=10000)
0.005186535003304016
>>> timeit.timeit("json.dumps([1,2,3])","import json",number=10000)
0.03863359600654803
>>> timeit.timeit("marshal.dumps([1,2,3])","import marshal", number=10000)
0.00884882499667583
>>>
It seems that pickle
is now a little bit faster than marshal
.
Upvotes: 2
Reputation: 3919
When somebody are thinking about performance he should to remember 3 things:
For example, here is results of my benchmark:
jimilian$ python3.5 serializators.py
iterations= 100000
data= 'avzvasdklfjhaskldjfhkweljrqlkjb*@&$Y)(!#&$G@#lkjabfsdflb(*!G@#$(GKLJBmnz,bv(PGDFLKJ'
==== DUMP ====
Pickle:
>> 0.09806302400829736
Json: 2.0.9
>> 0.12253901800431777
Marshal: 4
>> 0.09477431800041813
Msgpack: (0, 4, 7)
>> 0.16701826300413813
==== LOAD ====
Pickle:
>> 0.10376790800364688
Json: 2.0.9
>> 0.30041573599737603
Marshal: 4
>> 0.034003349996055476
Msgpack: (0, 4, 7)
>> 0.061493027009419166
jimilian$ python3.5 serializators.py
iterations= 100000
data= [1,2,3]*100
==== DUMP ====
Pickle:
>> 0.9678693519963417
Json: 2.0.9
>> 4.494351467001252
Marshal: 4
>> 0.8597690019960282
Msgpack: (0, 4, 7)
>> 1.2778299400088144
==== LOAD ====
Pickle:
>> 1.0350999219954247
Json: 2.0.9
>> 3.349724347004667
Marshal: 4
>> 0.468191737003508
Msgpack: (0, 4, 7)
>> 0.3629750510008307
jimilian$ python2.7 serializators.py
iterations= 100000
data= [1,2,3]*100
==== DUMP ====
Pickle:
>> 50.5894570351
Json: 2.0.9
>> 2.69190311432
cPickle: 1.71
>> 5.14689707756
Marshal: 2
>> 0.539206981659
Msgpack: (0, 4, 7)
>> 0.752672195435
==== LOAD ====
Pickle:
>> 58.8052768707
Json: 2.0.9
>> 3.50090789795
cPickle: 1.71
>> 8.46298909187
Marshal: 2
>> 0.469168901443
Msgpack: (0, 4, 7)
>> 0.315001010895
So, as you can see sometimes it's better to use Pickle
(python3, long string, dump), sometimes - msgpack
(python3, long array, load), in python2 - things works completely different. That's why nobody can give certain answer that will be valid for everybody.
Upvotes: 10
Reputation: 526713
For pure speed, marshal
will get you the fastest results.
Timings:
>>> timeit.timeit("pickle.dumps([1,2,3])","import pickle",number=10000)
0.2939901351928711
>>> timeit.timeit("json.dumps([1,2,3])","import json",number=10000)
0.09756112098693848
>>> timeit.timeit("pickle.dumps([1,2,3])","import cPickle as pickle",number=10000)
0.031056880950927734
>>> timeit.timeit("marshal.dumps([1,2,3])","import marshal", number=10000)
0.00703883171081543
Upvotes: 18
Reputation: 184211
Check out shelve, a simple persistent key-value store with a dictionary-like API that uses pickle to serialize objects.
Upvotes: 1
Reputation: 488453
Time them and find out!
I'd expect cPickle to be the fastest but that's no guarantee.
Upvotes: 3