How do I find out how many Mongo documents were actually inserted?

Question

I have a function that looks like this:

def insert_multiple_cakes(cake_list)
  ensure_indexes

  insert_list = cake_list.map { |cake| mongofy_values(cake.to_hash) }

  inserted = db[CAKE_COLLECTION].insert(insert_list, w: 0)

  return inserted.length
end

The goal of the function is to insert all cakes from cake_list into the Mongo database. Any cake that already exists in the database should be ignored. The function should return the number of cakes inserted, so if cake_list contains 5 cakes and 2 of those cakes already exist in the database, the function should return 3.

My problem is that after an hour of experimenting, I have concluded the following:

If the write concern (the :w option) is 0, then the insert call silently ignores all duplicate inserts, and the return value contains all the input documents, even those that weren't inserted. It doesn't matter what I set :continue_on_error or :collect_on_error, the return value always contains all the documents, and the list of collected errors is always empty.
If the write concern is 1, then the insert call fails with an Mongo::OperationFailure if there are any duplicates among the input documents. It doesn't matter what I set :continue_on_error or :collect_on_error to, the insert always fails when there are duplicates.

So it seems to me that the only way to achieve this is to iterate over the input list, perform a search for EVERY document and filter away those that already exist. My application is going to deal with (at least) thousands of inserts at a time, so I like this plan about as much as I'd like to jump off a bridge.

Have I misunderstood something, or is the Ruby client perhaps bugged?

To demonstrate, this function does exactly what I want and works:

def insert_multiple_cakes(cake_list)
  ensure_indexes

  collection = db[CAKE_COLLECTION]

  # Filters away any cakes that already exists in the database.
  filtered_list = cake_list.reject { |cake|
    collection.count(query: {"name" => cake.name}) == 1
  }

  insert_list = filtered_list.map { |cake| mongofy_values(cake.to_hash) }

  inserted = collection.insert(insert_list)

  return inserted.length
end

The problem is that it performs about a gazillion searches where it should only really have to do one insert.

Documentation for Mongo::Collection#insert

Randall Hunt · Accepted Answer

You can do something like this (source):

coll = MongoClient.new().db('test').collection('cakes')
  bulk = coll.initialize_unordered_bulk_op
  bulk.insert({'_id' => "strawberry"})
  bulk.insert({'_id' => "strawberry"}) # duplicate key
  bulk.insert({'_id' => "chocolate"})
  bulk.insert({'_id' => "chocolate"}) # duplicate key
begin
  bulk.execute({:w => 1}) # this is the default but don't change it to 0 or you won't get the errors
rescue => ex
  p ex
  p ex.result
end

ex.result contains ninserted and the reason each one failed.

{"ok"=>1,
 "n"=>2,
 "code"=>65,
 "errmsg"=>"batch item errors occurred",
 "nInserted"=>2,
 "writeErrors"=>
  [{"index"=>1,
    "code"=>11000,
    "errmsg"=>
     "insertDocument :: caused by :: 11000 E11000 duplicate key error index: test.cakes.$_id_  dup key: { : \"strawberry\" }"},
   {"index"=>3,
    "code"=>11000,
    "errmsg"=>
     "insertDocument :: caused by :: 11000 E11000 duplicate key error index: test.cakes.$_id_  dup key: { : \"chocolate\" }"}]}

How do I find out how many Mongo documents were actually inserted?

Answers (2)

Related Questions