Unpickling bytea columns from Psycopg2

Question

I am storing pickled objects in a PostgreSQL database. Originally I thought this was a bad idea, but they are seldom accessed and from what I've learned apparently TOAST makes the performance impact minimal for storing big blobs in a relational database.

When you INSERT or UPDATE a bytea column it is simple. Just construct a psycopg2.Binary and pass it to the execute call on the cursor object. In my case, it's a pickled object.

Whenever you do a SELECT and get back a bytea column you wind up with a python buffer object. In other words, you can't just do a pickle.loads or a pickle.load. The best I've come up with is using StringIO

import psycopg2
import cPickle as pickle
import cStringIO as StringIO

conn = psycopg2.connect(user='postgres',database='postgres')

cur = conn.cursor()

cur.execute('Select %s', (psycopg2.Binary(pickle.dumps({'foo':'bar'},-1)), ))

result, = cur.fetchone()

cur.close()
conn.rollback()

result = StringIO.StringIO(result)

print pickle.load(result)

What's the cost of this? Are the StringIO objects just shallow copies of the original buffer object? Is there a more practical way of doing this?

I'm using Stackless 2.7.5 if it matters.

Eric Urban · Accepted Answer

It turns out that using cStringIO and cPickle is many times faster and is the best option in this case. Full writeup here http://www.hydrogen18.com/blog/unpickling-buffers.html

Unpickling bytea columns from Psycopg2

Answers (2)

Related Questions