daiyue
daiyue

Reputation: 7458

mongodb how to delete_many based on mongo _ids

I have a collection (coll) in db looks like,

_id                                    pri_key
ObjectId("5b20f64dc227f879944f330c")   a_1
ObjectId("5b20f64dc227f879944f330d")   b_1
ObjectId("5b20f64dc227f879944f330e")   c_1
ObjectId("5b20f64dc227f879944f330f")   d_1
ObjectId("5b20f64dc227f879944f3310")   e_1

I want to delete_many docs using their _ids which are corresponding to a list (say keys) of pri_key, i.e.keys = ['a_1', 'b_1', 'c_1'] in coll, I am wondering how to do that.

while I can do db.coll.delete_many({'primary_key': {'$in': keys}}), I suspect MongoDB handles _id faster than other keys defined in the docs.

UPDATE. the original problem is that I convert data stored in a pandas DataFrame (df) into a list of dicts, and then insert them into mongo. This mongo write is also an incremental insertion, meaning that if there are any overlapped docs between the collection in the db and the list, I will delete the docs in the collection first then insert the new ones using the list.

The deletion is using pri_key, so I get the pri_key values from the df into a list first, then simply using db.coll.delete_many({'primary_key': {'$in': keys}}) to delete those overlapped docs in the collection.

The problem is that I found it very slowly especially when facing 10 million docs in a single collection. So I am wondering is there a way to speed up this process.

Upvotes: 0

Views: 405

Answers (1)

Alex Blex
Alex Blex

Reputation: 37128

Create an index in mongodb:

db.collection.createIndex({pri_key:1})

It will speed up deletion by pri_key.

If you will never ever have 2 documents with the same pri_key, it can be

db.collection.createIndex({pri_key:1}, {unique: true})

In this case an attempt to add a document with the same pri_key will result with an error.

Upvotes: 1

Related Questions