Reputation: 3820
I am storing pickled objects in a PostgreSQL database. Originally I thought this was a bad idea, but they are seldom accessed and from what I've learned apparently TOAST makes the performance impact minimal for storing big blobs in a relational database.
When you INSERT
or UPDATE
a bytea
column it is simple. Just construct a psycopg2.Binary
and pass it to the execute
call on the cursor object. In my case, it's a pickled object.
Whenever you do a SELECT
and get back a bytea
column you wind up with a python buffer
object. In other words, you can't just do a pickle.loads
or a pickle.load
. The best I've come up with is using StringIO
import psycopg2
import cPickle as pickle
import cStringIO as StringIO
conn = psycopg2.connect(user='postgres',database='postgres')
cur = conn.cursor()
cur.execute('Select %s', (psycopg2.Binary(pickle.dumps({'foo':'bar'},-1)), ))
result, = cur.fetchone()
cur.close()
conn.rollback()
result = StringIO.StringIO(result)
print pickle.load(result)
What's the cost of this? Are the StringIO
objects just shallow copies of the original buffer
object? Is there a more practical way of doing this?
I'm using Stackless 2.7.5 if it matters.
Upvotes: 4
Views: 1978
Reputation: 3820
It turns out that using cStringIO and cPickle is many times faster and is the best option in this case. Full writeup here http://www.hydrogen18.com/blog/unpickling-buffers.html
Upvotes: 2
Reputation: 117520
I don't think you need to create StringIO
object, you can just create str
from your result
and then read it:
>>> pickle.loads(str(result))
{'foo': 'bar'}
Don't know about cStringIO
, but StringIO
do this anyway (taken from StringIO
):
def __init__(self, buf = ''):
# Force self.buf to be a string or unicode
if not isinstance(buf, basestring):
buf = str(buf)
update: tested it with timeit
:
>>> timeit('s = pickle.loads(str(result))', 'import cPickle as pickle;from __main__ import result', number=100000)
1.2336693825245675
>>> timeit('s = pickle.load(StringIO.StringIO(result))', 'import cPickle as pickle; import cStringIO as StringIO;from __main__ import result', number=100000)
1.0089504222504786
So your approach a bit faster than creating string from buffer
Upvotes: 1