Alon Burg
Alon Burg

Reputation: 2540

Experiencing Mongo::OperationTimeout every 20mins-2 hours

I seem to be experiencing a Mongo::OperationTimeout every ~20 mins - 1 Hour My stack:

I have tried setting KeepAlive on EC2 to 300 like said in http://www.mongodb.org/display/DOCS/Amazon+EC2 but still did not help

I have tried working with just one primary configuration instead of the ReplicaSet, but this did not help either.

Below is mongoid.conf:

production:
  database: my-app-name
  op_timeout: 10
  read_secondary: true
  max_retries_on_connection_failure: 3
  identity_map_enabled: true
  allow_dynamic_fields: false
  hosts:
    - - ip-XXX.ec2.internal
      - 27017
    - - ip-XXX.ec2.internal
      - 27017

Upvotes: 3

Views: 825

Answers (1)

Alon Burg
Alon Burg

Reputation: 2540

After some group thinking, here are some points we came up with regarding our situation:

  • We are using mongoid 3.0 with op_timeout: 30 (versions 2.3 and less of Mongoid did not have op_timeout enabled) which actually floats the OperationTimeout. It is possible that many other users are experiencing this but do not actually get this in the logs, but rather just stuck unicorn workers.
  • We are using Unicorn, which spawns processes ahead of time and keep them waiting, unlike Passenger which scales dynamically. Since we currently are just in test mode, and do not have real traffic, it is possible that many of the workers become idle, and their mongo connection becomes stale. Most people are probably not getting to this either, but might experience this every now and then.
  • It seems like the Linux KeepAlive described in here www.mongodb.org/display/DOCS/Troubleshooting#Troubleshooting-Socketerrorsinshardedclustersandreplicasets does not help
  • For now, I have created a dummy Rack middleware to do an initial mongo query and handle the exception if needed. Here's the code https://gist.github.com/1647879

Upvotes: 3

Related Questions