Reputation: 26900
I wish to use shelve in an asyncio program and I fear that every change will cause the main event loop to stall.
While I don't mind the occasional slowdown of the pickling operation, the disk writes may be substantial.
Every how often does shelve sync to disk? Is it a blocking operation? Do I have to call .sync()
?
If I schedule the sync()
to run under a different thread, a different asyncio task may modify the shelve at the same time, which violates the requirement of single-thread writes.
Upvotes: 5
Views: 79
Reputation: 155373
shelve
, by default, is backed by the dbm
module, in turn backed by some dbm
implementation available on the system. Neither the shelve
module, nor the dbm
module, make any effort to minimize writes; an assignment of a value to a key causes a write every time. Even when writeback=True
, that just means that new assignments are placed in the cache and immediately written to the backing dbm
; they're written to make sure the original value is there, and the cache entry is made because the object assigned might change after assignment and needs to be handled just like a freshly read object (meaning it will be written again when sync
ed or close
d, in case it changed).
While it's possible some implementation of the underlying dbm
libraries might include some caching, AFAICT, most do try to write immediately (that is, pushing data to the kernel immediately without user-mode buffering), they just don't necessarily force immediate synchronization to disk (though it can be requested, e.g. with gdbm_sync
).
writeback=True
will make it worse, because when it does sync
, it's a major effort (it literally rewrites every object read or written to the DB since the last sync
, because it has no way of knowing which of them might have been modified), as opposed to the small effort of rewriting a single key/value pair at a time.
In short, if you're really concerned about blocking writes, you can't use unthreaded async code without potential blocking, but said blocking is likely short-lived as long as writeback=True
is not involved (or as long as you don't sync
/close
it until performance considerations are no longer relevant). If you need to have truly non-blocking async behavior, all shelve
interactions will need to occur under a lock in worker threads, and either writeback
must be False
(to avoid race conditions pickling data) or if writeback
is True
, you must take care to avoid modifying any object that might be in the cache during the sync
/close
.
Upvotes: 5
Reputation: 780974
It writes to disk every time you update the shelve
object itself. So if you do
shelf[key] = something
or
shelf.update(somedict)
it will write to the file.
However, if there are mutable values in the in the dictionary, modifying them will not trigger a write to the file. Objects in Python don't have any reference back to the containers that reference them, so there's no way for the shelve object to detect those changes and write the file. If you need to support mutable values in the dictionary, you should use the writeback=True
option when creating the shelve, to create an in-memory cache; the file will then be updated whenever you sync()
or close()
.
Upvotes: 4