Meteor loses connection to the database

Question

I'm running an instance of Meteor on Digital Ocean, and hosting the Mongo database on Mongolab. If the site has been idle for a few hours, and someone goes to a specific page, Meteor appears to drop its connection with the database for 3-15 minutes, without any errors or any warnings. Here's what I've been able to figure out:

The Meteor Server on DigitalOcean

Continues to run, and Meteor.status() shows an active connection
CPU load drops during an episode
Will continue to serve up copies of the webapp.

The MongoDB on Mongolab

Query operations drop to almost-zero
Page Faults spike
Network-out traffic drops to nothing.
Can still be accessed and queried directly.
Other servers (workers) using the same database carry on as usual.

I suspect that it has something to do with the following publication:

Meteor.publish('spaceUtilSpace', function(view_id, space_id){
  if(!checkSpaceUtilPermissions(view_id, "View Reader", this.userId)) { this.ready(); return; }

  var thisUser = Meteor.users.findOne({_id: this.userId});
  var thisView = View_SpaceUtil.findOne({_id: view_id});

  if(thisView){
    var thisSpace = Spaces.findOne({_id: space_id});

    return [
      View_SpaceUtil.find({_id: view_id}),
      Bldgs.find({_id: thisSpace.localID.bldg_id}),
      Spaces.find({_id: space_id}),
      Schedule.find({"localID.space_id":space_id, startDateMs:{$lte:thisView.time.toDate}, endDateMs:{$gte:thisView.time.fromDate}})
    ]
  }
})

I suspect the problem is most likely in this line: Schedule.find({"localID.space_id":space_id, startDateMs:{$lte:thisView.time.toDate}, endDateMs:{$gte:thisView.time.fromDate}}), as that's my largest collection (~80,000 documents, 150 MB).

At first I thought I might just need an index for this query, that it was simply taking too long to process this particular query, but after creating an index for {"localID.space_id":1, startDateMs:-1, endDateMs:1}, I'm still having the same problem.

I'm starting to run low on ideas as to how to fix this, so any suggestions would be incredibly helpful. Thanks!

More Info

Going through the Mongo logs, I've found the following two lines:

2015-12-04T08:11:09.904-0800 I QUERY    [conn51589] query myDatabase.schedule query: { localID.space_id: "mjEYjonRaFrrr8gcX", startDateMs: { $lte: 1451520000000.0 }, endDateMs: { $gte: 1262304000000.0 } } planSummary: COLLSCAN ntoreturn:0 ntoskip:0 nscanned:0 nscannedObjects:78172 keyUpdates:0 writeConflicts:0 numYields:6664 nreturned:0 reslen:20 locks:{ Global: { acquireCount: { r: 13330 } }, MMAPV1Journal: { acquireCount: { r: 6665 } }, Database: { acquireCount: { r: 6665 } }, Collection: { acquireCount: { R: 6665 } } } 232971ms
2015-12-04T08:11:10.429-0800 I QUERY    [conn51593] query myDatabase.schedule query: { localID.space_id: "mjEYjonRaFrrr8gcX", startDateMs: { $lte: 1451520000000.0 }, endDateMs: { $gte: 1262304000000.0 } } planSummary: COLLSCAN ntoreturn:0 ntoskip:0 nscanned:0 nscannedObjects:78172 keyUpdates:0 writeConflicts:0 numYields:610 nreturned:0 reslen:20 locks:{ Global: { acquireCount: { r: 1222 } }, MMAPV1Journal: { acquireCount: { r: 611 } }, Database: { acquireCount: { r: 611 } }, Collection: { acquireCount: { R: 611 } } } 128ms

It appears that the problem is one query is taking an incredibly long time to complete, and isn't allowing new queries to be made until it completes.

What's confusing me on these two are that the query itself is identical, but the 'acquireCount' for the first one has 10x contents, and took ~2000x longer to return. These fields are indexed...any ideas as to why this would happen?

Itinerati · Accepted Answer

After some discussion with Mongolab support, I've got an answer (probably).

I'm on a shared cluster plan, so if a query hasn't been run for a few hours, it's flushed from memory to allow other users to access that block. The next time the query is run, it has to reload that data into memory, which in this case was taking a very long time. I've re-evaluated my indexing strategy, and I found that I'd missed the index I should have had -- I'd indexed "localID.bldg_id", but forgotten to do a separate index that included "localID.space_id", which was the important one for this issue.

I'll have to wait until the memory flushes before I can verify that this solution is working, but it seems likely.

If it doesn't, Mongolab's suggestion is to move to a dedicated cluster, rather than using shared.

Meteor loses connection to the database

Answers (1)

Related Questions