Cjoerg
Cjoerg

Reputation: 1325

How to bulk update/upsert with mongoid/mongodb?

I have a database with millions of Orderdocuments. I batch insert them with the following method:

Order.collection.insert([
                         {:_id=>BSON::ObjectId('5471944843687229cdfb0000'), :status=>"open", :name=> "Benny"},
                         {:_id=>BSON::ObjectId('5471944843687229cdfc0000'), :status=>"open", :name=> "Allan"}
                        ])

I regularly need to update the status attribute on the orders. It would be way to inefficient to update them individually with the update_attribute method.

How do I bulk update multiple MongoDB documents?

The desired solution can best be described with the below "fictional" code:

# IMPORTANT: The exemplified upsert method does not exist

Order.collection.upsert([
                         {:_id=>BSON::ObjectId('5471944843687229cdfb0000'), :status=>"closed"},
                         {:_id=>BSON::ObjectId('5471944843687229cdfc0000'), :status=>"some_other_status"}
                        ])

Fyi, there might be a similar question/answer in this SO post, but in all honesty I don't follow the answer.

Upvotes: 4

Views: 7294

Answers (4)

Cyril Duchon-Doris
Cyril Duchon-Doris

Reputation: 13949

The best answer in the referenced question can be simplified to

id_status = [['5471944843687229cdfb0000','closed'], ...] 

bulk_order = id_status.map do |id, status| # Using array destructuration
  { update_one:
    {
      filter: { _id: id },
      update: { :'$set' => {
        status: status,
      }}
    }
  }
end
YourCollection.collection.bulk_write(bulk_order)

Upvotes: 6

Fran Martinez
Fran Martinez

Reputation: 3042

The real problem here are updates. Updates are slow because it needs to read, replace and change a document.

I've been blocked with in the same problem for many days. I didn't find any solution in stackoverflow nor in any other websites. Therefore, I wrote my own solution. Maybe you will find that it's not very "clean", but it works with excellent time results.

The solution consists in destroy an create again this document. Destroy is very fast, and create new documents using the bulk execution "collection.insert" is super fast.

def get_orders(*params)
   Order.where(# some conditions).asc(:id)
end

namespace :my_collection_repairer do
desc ""

  task update: :environment do
    all_orders = get_orders(# some conditions)
    while all_orders.count > 0
      num_docs = all_orders.count
      group_size = 10000
      num_groups = (Float(num_docs) / group_size).ceil
      puts "#{num_docs} documents found. #{num_groups} groups calculated."

      1.upto(num_groups) do |group|
        updated_order_list = []
        order_group = all_orders.page(group).per(group_size)
        puts "group #{group}"

        order_group.each do |order|
          updated_order = update_order(order) # this represents your custom update method
          updated_order_list << updated_order.as_document
          order.destroy
        end

        Order.collection.insert(updated_order_list)
        puts "Group #{group} updated."
      end
      all_orders = get_orders(# some conditions)
    end
  end
end

Upvotes: 1

SUNDARRAJAN K
SUNDARRAJAN K

Reputation: 2295

Sets the upsert option to true for an update or a replacement operation and has the following syntax:

 bulk.find( { status: "closed" } ).update( { $set: { status: "some_other_status" } } );
 bulk.execute();

Adds a multi update operation to a bulk operations list. The method updates specific fields in existing documents.

Use the Bulk.find() method to specify the condition that determines which documents to update. The Bulk.find.update() method updates all matching documents. To specify a single document update, see Bulk.find.updateOne().

 var bulk = db.collection.initializeUnorderedBulkOp();
 bulk.find( { status: "closed" } ).upsert().update(
{
 $set: { status: "some_other_status"}
}
);
bulk.execute();

Note:

To specify upsert: true for this operation, use Bulk.find.upsert(). With Bulk.find.upsert(), if no documents match the Bulk.find() query condition, the update operation inserts only a single document. Hope this helps.

Upvotes: -1

borjagvo
borjagvo

Reputation: 2081

Firstly, you'll need to filter Orders for only those matching ids of orders_to_update. You filter them with the any_in Criteria method. Then update all of them in bulk with update_all.

Like so:

orders_to_update = [BSON::ObjectId('5471944843687229cdfb0000'), BSON::ObjectId('5471944843687229cdfc0000')]

Order.any_in(id: orders_to_update).update_all(status: "closed")

Upvotes: 3

Related Questions