Reputation: 7458
I have a collection (coll
) in db
looks like,
_id pri_key
ObjectId("5b20f64dc227f879944f330c") a_1
ObjectId("5b20f64dc227f879944f330d") b_1
ObjectId("5b20f64dc227f879944f330e") c_1
ObjectId("5b20f64dc227f879944f330f") d_1
ObjectId("5b20f64dc227f879944f3310") e_1
I want to delete_many
docs using their _id
s which are corresponding to a list (say keys
) of pri_key
, i.e.keys = ['a_1', 'b_1', 'c_1']
in coll
, I am wondering how to do that.
while I can do db.coll.delete_many({'primary_key': {'$in': keys}})
, I suspect MongoDB
handles _id
faster than other keys defined in the docs.
UPDATE. the original problem is that I convert data stored in a pandas
DataFrame
(df
) into a list of dict
s, and then insert them into mongo
. This mongo
write is also an incremental insertion, meaning that if there are any overlapped docs between the collection in the db and the list, I will delete the docs in the collection first then insert the new ones using the list.
The deletion is using pri_key
, so I get the pri_key
values from the df
into a list first, then simply using db.coll.delete_many({'primary_key': {'$in': keys}})
to delete those overlapped docs in the collection.
The problem is that I found it very slowly especially when facing 10 million docs in a single collection. So I am wondering is there a way to speed up this process.
Upvotes: 0
Views: 405
Reputation: 37128
Create an index in mongodb:
db.collection.createIndex({pri_key:1})
It will speed up deletion by pri_key
.
If you will never ever have 2 documents with the same pri_key
, it can be
db.collection.createIndex({pri_key:1}, {unique: true})
In this case an attempt to add a document with the same pri_key
will result with an error.
Upvotes: 1