Reputation: 14023
This question is related to How to pack blobstorage with Plone and RelStorage
Using zodb database with RelStorage and sqlite as its backend I am trying to remove unused blobs. Currently db.pack does not remove the blobs from disc. The minimum working example below demonstrates this behavior:
import logging
import numpy as np
import os
import persistent
from persistent.list import PersistentList
import shutil
import time
from ZODB import config, blob
connectionString = """
%import relstorage
<zodb main>
<relstorage>
blob-dir ./blob
keep-history false
cache-local-mb 0
<sqlite3>
data-dir .
</sqlite3>
</relstorage>
</zodb>
"""
class Data(persistent.Persistent):
def __init__(self, data):
super().__init__()
self.children = PersistentList()
self.data = blob.Blob()
with self.data.open("w") as f:
np.save(f, data)
def main():
logging.basicConfig(level=logging.INFO)
# Initial cleanup
for f in os.listdir("."):
if f.endswith("sqlite3"):
os.remove(f)
if os.path.exists("blob"):
shutil.rmtree("blob", True)
# Initializing database
db = config.databaseFromString(connectionString)
with db.transaction() as conn:
root = Data(np.arange(10))
conn.root.Root = root
child = Data(np.arange(10))
root.children.append(child)
# Removing child reference from root
with db.transaction() as conn:
conn.root.Root.children.pop()
db.close()
print("blob directory:", [[os.path.join(rootDir, f) for f in files] for rootDir, _, files in os.walk("blob") if files])
db = config.databaseFromString(connectionString)
db.pack(time.time() + 1)
db.close()
print("blob directory:", [[os.path.join(rootDir, f) for f in files] for rootDir, _, files in os.walk("blob") if files])
if __name__ == "__main__":
main()
The example above does the following:
db.pack
for one second in the future.The output of the minimum working example is the following:
INFO:ZODB.blob:(23376) Blob directory '<some path>/blob/' does not exist. Created new directory.
INFO:ZODB.blob:(23376) Blob temporary directory './blob/tmp' does not exist. Created new directory.
blob directory: [['blob/.layout'], ['blob/3/.lock', 'blob/3/0.03da352c4c5d8877.blob'], ['blob/6/.lock', 'blob/6/0.03da352c4c5d8877.blob']]
INFO:relstorage.storage.pack:pack: beginning pre-pack
INFO:relstorage.storage.pack:Analyzing transactions committed Thu Aug 27 11:48:17 2020 or before (TID 277592791412927078)
INFO:relstorage.adapters.packundo:pre_pack: filling the pack_object table
INFO:relstorage.adapters.packundo:pre_pack: Filled the pack_object table
INFO:relstorage.adapters.packundo:pre_pack: analyzing references from 7 object(s) (memory delta: 256.00 KB)
INFO:relstorage.adapters.packundo:pre_pack: objects analyzed: 7/7
INFO:relstorage.adapters.packundo:pre_pack: downloading pack_object and object_ref.
INFO:relstorage.adapters.packundo:pre_pack: traversing the object graph to find reachable objects.
INFO:relstorage.adapters.packundo:pre_pack: marking objects reachable: 4
INFO:relstorage.adapters.packundo:pre_pack: finished successfully
INFO:relstorage.storage.pack:pack: pre-pack complete
INFO:relstorage.adapters.packundo:pack: will remove 3 object(s)
INFO:relstorage.adapters.packundo:pack: cleaning up
INFO:relstorage.adapters.packundo:pack: finished successfully
blob directory: [['blob/.layout'], ['blob/3/.lock', 'blob/3/0.03da352c4c5d8877.blob'], ['blob/6/.lock', 'blob/6/0.03da352c4c5d8877.blob']]
As you can see db.pack
does remove 3 objects "will remove 3 object(s)" but the blobs in the file system are unchanged.
In the unit tests of RelStorage it appears that they do test if the blobs are removed from the file system (see here), but in the script above it does not work.
What am I doing wrong? Any hint/link/help is appreciated.
Upvotes: 4
Views: 233
Reputation: 1124568
By default, the blob storage directory is used as a cache, storing blob data that also is stored in the database; the idea is that loading blob data from a local disk cache is faster than from a remote database server. Packing in a history-free storage with caching blob storage doesn’t delete unreachable blob files, instead relying on the file size limiter to evict stale cache data when room needs to be made. However, you did not set a size limit, so the cache grows unbounded and those unreachable blob files will live on forever.
Packing can’t remove blob files here because the cache is local to each ZODB client; it is outside the jurisdiction of the ZODB storage, as it were. This may not be as apparent when using SQLite as the database layer but imagine using Postgres instead, on a separate server, with multiple clients across different computers and you can see that cache clean-up is not feasible when packing.
Note that the other blob storage option is the shared blob storage, which is probably closer to what you expected this to be: all blob data is stored on disk, not in the database. When used with a remote database server and multiple clients you’d need to place this on something like a NTFS share. Packing operates directly on the blobs in that case and unreachable blob files are removed immediately when you pack.
You have two options:
Set a size limit for the blob cache by setting blob-cache-size
. Packing still won’t remove the blob files, but they will be removed when space is running low.
Switch to a shared blob cache (set shared-blob-dir
to true). For a sqlite-backed RelStorage this probably makes more sense than a caching blob storage, in spite of the dire warnings in the documentation!
So the easiest change would be to switch blob storage modes:
connectionString = """
%import relstorage
<zodb main>
<relstorage>
blob-dir ./blob
shared-blob-dir true
keep-history false
cache-local-mb 0
<sqlite3>
data-dir .
</sqlite3>
</relstorage>
</zodb>
"""
The output then changes to:
INFO:ZODB.blob:(26177) Blob directory '<some path>/blob/' does not exist. Created new directory.
INFO:ZODB.blob:(26177) Blob temporary directory './blob/tmp' does not exist. Created new directory.
blob directory: [['blob/.layout'], ['blob/0x00/0x00/0x00/0x00/0x00/0x00/0x00/0x03/0x03da4f169582cd22.blob', 'blob/0x00/0x00/0x00/0x00/0x00/0x00/0x00/0x03/.lock'], ['blob/0x00/0x00/0x00/0x00/0x00/0x00/0x00/0x06/0x03da4f169582cd22.blob', 'blob/0x00/0x00/0x00/0x00/0x00/0x00/0x00/0x06/.lock']]
INFO:relstorage.storage.pack:pack: beginning pre-pack
INFO:relstorage.storage.pack:Analyzing transactions committed Tue Sep 1 01:22:35 2020 or before (TID 277621285453417864)
INFO:relstorage.adapters.packundo:pre_pack: filling the pack_object table
INFO:relstorage.adapters.packundo:pre_pack: Filled the pack_object table
INFO:relstorage.adapters.packundo:pre_pack: analyzing references from 7 object(s) (memory delta: 0 KB)
INFO:relstorage.adapters.packundo:pre_pack: objects analyzed: 7/7
INFO:relstorage.adapters.packundo:pre_pack: downloading pack_object and object_ref.
INFO:relstorage.adapters.packundo:pre_pack: traversing the object graph to find reachable objects.
INFO:relstorage.adapters.packundo:pre_pack: marking objects reachable: 4
INFO:relstorage.adapters.packundo:pre_pack: finished successfully
INFO:relstorage.storage.pack:pack: pre-pack complete
INFO:relstorage.adapters.packundo:pack: will remove 3 object(s)
INFO:relstorage.adapters.packundo:pack: cleaning up
INFO:relstorage.adapters.packundo:pack: finished successfully
blob directory: [['blob/.layout'], ['blob/0x00/0x00/0x00/0x00/0x00/0x00/0x00/0x03/0x03da4f169582cd22.blob', 'blob/0x00/0x00/0x00/0x00/0x00/0x00/0x00/0x03/.lock']]
And yes, the blob dir layout changes, so it can deal with every possible OID, ever. OID 6 has been removed however.
The unit tests you found are only run when testing with a shared blob cache:
# If the blob directory is a cache, don't test packing,
# since packing can not remove blobs from all caches.
test_packing = shared_blob_dir
Upvotes: 4