mabounassif
mabounassif

Reputation: 2341

Bulk update_or_create optimization in CouchDb with Ruby

I have the following method that stores or updates a list of event json objects. I couldn't find a bulk create_or_update function for couchdb, I had to query each object and see if it exists in the database and create/update accordingly. Unfortunately this is highly inefficient, it takes 6 mins for 1725 events to be processed. Can someone propose a better design? It has to be done in a couple of seconds. My couchdb is actually a ssl cloudant database, and my app is hosted on Heroku, which is deferent than the app on heroku that is actually combined with cloudant.

def self.store(bulk, resource) 
            JSON::Validator.validate!(SCHEMA, bulk, :list => true)
            bulk.each{ |event|
                response = resource.get("/database-dev/_design/Event/_view/byEID?key=\"#{event['eid']}\"")
                if (response["rows"].nil? || response["rows"].empty?) then
                    o =  [('a'..'z'),('A'..'Z'),(0..9)].map{|i| i.to_a}.flatten  
                    o.push('-','_')
                    event['_id']  =  (0..50).map{ o[rand(o.length)]  }.join
                    event['resource'] = 'Event'
                    resource.post('/database-dev', event.to_json)
                else
                    resource.put("/database-dev/#{response['rows'][0]['id']}", event.to_json)   
                end
            }
        end 

Upvotes: 1

Views: 206

Answers (1)

JasonSmith
JasonSmith

Reputation: 73752

You can use the CouchDB bulk document API to create-or-update. Of course, since you are "flying blind" with the _rev values, the trade-off is that you might create revision conflicts. That might not be a problem for you, or in some cases it may be impossible or extremely rare (depending on your application). Simply add the "all_or_nothing":true option in your POST body.

Alternatively, you can do a bulk create-or-update in two round-trips. First fetch all the documents revisions, then post a traditional _bulk_docs request with all the _rev values set.

POST /database-dev/_all_docs
Content-Type: application/json

{"keys": ["id_1", "id_2", "bad_id"]}

HTTP/1.1 200 OK
...couch headers...

{"total_rows":10,"offset":0,"rows":[
  {"id":"id_1","key":"id_1","value":{"rev":"1-6919deb28bdb1d4cf5b53188be5683be"}},
  {"id":"id_2","key":"id_2","value":{"rev":"1-37bb8117bc6c7b182ca26aae16717408"}},
  {"key":"bad_id", "error":"not_found"}
]}

(You can do the same thing when requesting a view.)

Now you know all the values for _rev to send in the _bulk_docs. (If it had a "rev" value, use that, otherwise, leave _rev out to create a new document.)

Upvotes: 1

Related Questions