Reputation: 2738
What is the most efficient way to delete orphan blobs from a Blobstore?
App functionality & scope:
Possible, yet inefficient solutions:
Is there any better way of doing this? I've searched for similar posts yet I couldn't find any mentioning efficient solutions.
Thanks in advance!
Upvotes: 2
Views: 1023
Reputation: 1278
Use Drafts! Save as draft after each upload. Then dont do the cleaning! Let the user for himself chose to wipe out.
If you're planning on posts in a Facebook style use drafts either or make it private. Why bother deleting users' data?
Upvotes: 0
Reputation: 2738
Thank for the comments. However, I understood those solutions well, I find them too inefficient. Querying thousands of entries for those that are flagged as "unused" is not ideal.
I believe I have come up with a better way and would like to hear your thoughts on it:
When a blob is saved, immediately a deferred task is created to delete the same blob in an hour’s time. If the post is created and saved, the deferred task is deleted, thus the blob will not be deleted in an hour’s time.
I believe this saves you from having to query thousands of entries every single hour.
What are your thoughts on this solution?
Upvotes: 3
Reputation: 41099
You can create an entity that links blobs to users. When a user uploads a blob, you immediately create a new record with the blob id, user id (or post id), and time created. When a user submits a post, you add a flag to this entity, indicating that a blob is used.
Now your cron job needs to fetch all entities of this kind where a flag is not equal to "true" and time created is more one hour ago. Moreover, you can fetch keys only, which is a more efficient operation that fetching full entities.
Upvotes: 1
Reputation: 11706
A blob has also a filename. After the post you can delete all the old blobs with the same filename. The duplicates to delete must have the same owner or do not have an owner. You also have to delete the blobs which do not have an owner.
Here is an example to delete the duplicates, after an upload.
blobs = blobstore.BlobInfo.gql("WHERE filename = '%s'" %(filename))
for blob in blobs :
if blob.key() != userdata.blob_ref.key() : blob.delete()
To clean up "not used" blobs, you can schedule a task after every upload, to run after an hour.
Upvotes: 1