user910210
user910210

Reputation: 313

Why is dumping with `pickle` much faster than `json`?

This is for Python 3.6.

Edited and removed a lot of stuff that turned out to be irrelevant.

I had thought json was faster than pickle and other answers and comments on Stack Overflow make it seem like a lot of other people believe this as well.

Is my test kosher? The disparity is much larger than I expected. I get the same results testing on very large objects.

import json
import pickle
import timeit

file_name = 'foo'
num_tests = 100000

obj = {1: 1}

command = 'pickle.dumps(obj)'
setup = 'from __main__ import pickle, obj'
result = timeit.timeit(command, setup=setup, number=num_tests)
print("pickle: %f seconds" % result)

command = 'json.dumps(obj)'
setup = 'from __main__ import json, obj'
result = timeit.timeit(command, setup=setup, number=num_tests)
print("json:   %f seconds" % result)

and the output:

pickle: 0.054130 seconds
json:   0.467168 seconds

Upvotes: 3

Views: 7962

Answers (3)

Ahmed Abobakr
Ahmed Abobakr

Reputation: 1666

I have tried several methods based on your code snippet and found out that using cPickle with setting the protocol argument of the dumps method as: cPickle.dumps(obj, protocol=cPickle.HIGHEST_PROTOCOL) is the fastest dump method.

import msgpack
import json
import pickle
import timeit
import cPickle
import numpy as np

num_tests = 10

obj = np.random.normal(0.5, 1, [240, 320, 3])

command = 'pickle.dumps(obj)'
setup = 'from __main__ import pickle, obj'
result = timeit.timeit(command, setup=setup, number=num_tests)
print("pickle:  %f seconds" % result)

command = 'cPickle.dumps(obj)'
setup = 'from __main__ import cPickle, obj'
result = timeit.timeit(command, setup=setup, number=num_tests)
print("cPickle:   %f seconds" % result)


command = 'cPickle.dumps(obj, protocol=cPickle.HIGHEST_PROTOCOL)'
setup = 'from __main__ import cPickle, obj'
result = timeit.timeit(command, setup=setup, number=num_tests)
print("cPickle highest:   %f seconds" % result)

command = 'json.dumps(obj.tolist())'
setup = 'from __main__ import json, obj'
result = timeit.timeit(command, setup=setup, number=num_tests)
print("json:   %f seconds" % result)


command = 'msgpack.packb(obj.tolist())'
setup = 'from __main__ import msgpack, obj'
result = timeit.timeit(command, setup=setup, number=num_tests)
print("msgpack:   %f seconds" % result)

Output:

pickle         :   0.847938 seconds
cPickle        :   0.810384 seconds
cPickle highest:   0.004283 seconds
json           :   1.769215 seconds
msgpack        :   0.270886 seconds

Upvotes: 6

yar
yar

Reputation: 1915

JSON serialises in a human readable way. pickle serialises in a binary representation. Nevertheless pickle often is pretty slow. There are variants like cPickle that are faster. If you want even better serialisation, use msgpack.

Upvotes: 0

pan8863
pan8863

Reputation: 733

How many times did you run the benchmarking? In any case you need to remove random delays that get introduced by thread blocking etc. You can do so by running your benchmark sufficiently high number of times. Also your input is too small to suppress any delays of 'boiler-plate' code.

Upvotes: -1

Related Questions