Paul
Paul

Reputation: 3091

A good persistent synchronous queue in python

I don't immediately care about fifo or filo options, but it might be nice in the future..

What I'm looking for a is a nice fast simple way to store (at most a gig of data or tens of millions of entries) on disk that can be get and put by multiple processes. The entries are just simple 40 byte strings, not python objects. Don't really need all the functionality of shelve.

I've seen this http://code.activestate.com/lists/python-list/310105/ It looks simple. It needs to be upgraded to the new Queue version.

Wondering if there's something better? I'm concerned that in the event of a power interruption, the entire pickled file becomes corrupt instead of just one record.

Upvotes: 12

Views: 11671

Answers (4)

bwdm
bwdm

Reputation: 818

This is a very old question, but persist-queue seems to be a nice tool for this kind of task.

persist-queue implements a file-based queue and a serial of sqlite3-based queues. The goals is to achieve following requirements:

  • Disk-based: each queued item should be stored in disk in case of any crash.
  • Thread-safe: can be used by multi-threaded producers and multi-threaded consumers.
  • Recoverable: Items can be read after process restart.
  • Green-compatible: can be used in greenlet or eventlet environment.

By default, persist-queue use pickle object serialization module to support object instances. Most built-in type, like int, dict, list are able to be persisted by persist-queue directly, to support customized objects, please refer to Pickling and unpickling extension types(Python2) and Pickling Class Instances(Python3)

Upvotes: 7

sw.
sw.

Reputation: 3231

I think that PyBSDDB is what you want. You can choose a queue as the access type. PyBSDDB is a Python module based on Oracle Berkeley DB. It has synchronous access and can be accessed from different processes although I don't know if that is possible from the Python bindings. About multiple processes writing to the db I found this thread.

Upvotes: 3

Pavel Shvedov
Pavel Shvedov

Reputation: 1314

Try using Celery. It's not pure python, as it uses RabbitMQ as a backend, but it's reliable, persistent and distributed, and, all in all, far better then using files or database in the long run.

Upvotes: 4

Christopher Mahan
Christopher Mahan

Reputation: 7619

Using files is not working?...

Use a journaling file system to recover from power interruptions. That's their purpose.

Upvotes: -3

Related Questions