Roy Toledo
Roy Toledo

Reputation: 643

MongoDB best practice for sorting by non-indexed fields

I have an app that allows users to use their own custom data, so I can't know what the data is. However, I do want to allow them to sort the data. This can be a significant amount of data, and mongodb ends up giving me memory errors (32MB limit)

What would be the best way to approach this? How can I allow the user to sort a large amount of data by an unknown field?

Upvotes: 1

Views: 1363

Answers (3)

Roy Toledo
Roy Toledo

Reputation: 643

My current thought is an additional indexed collection holding 1.entity id, 2 fields name 3.field value. Have that collection indexed, and then pull ordered entity ids from there, later on loading the full relevant documets by ID.

Upvotes: 0

Jankapunkt
Jankapunkt

Reputation: 8423

Since you have tagged this question Meteor I assume, you have the default Meteor environment, where you can use the client-side lightweight Mongo collections.

This gives you the opportunity to publish (Publication) / return (Method) your data mostly unsorted and let the client's handle this task.

Think this: just 100 clients asking for a publication that updates on every sort action (because the subscription parameters change, so the publication changes, too).

This causes already your server to consume a high amount of RAM to keep the observers (OPLOG etc.) running for 100 publications, each querying huge amounts of documents.

Possible performant solutions are described below. Please keep in mind, that they are not bound to any front-end and rather a conceptual description. You will have to include reactivity etc., based on your frontend environment.

Option A - Publish unsorted, let clients sort

server

Meteor.publish('hugeData', function () {
  return MyCollection.find({ ...})
})

client

const handle = Meteor.subscribe('hugeData')
if (handle.ready()) {
  const sortedData = MyCollection.find({ ... }, {sort: { someField: -1 } })
}

A big plus is here, that you can inform the clients about the completeness status, if using cursor.observeChanges.

Note, that if you want to scan backwards (return docs, with the newest) you can use the hint option on find:

Meteor.publish('hugeData', function () {
  return MyCollection.find({ ...}, { hint: { $natural : -1 })
})

This is way more performant than { sort: { fieldName: -1} }.

Option B - return unsorted from Method, let clients sort

Now there may still be a problem with solution A, since it still has a lot of RAM to consume if there are lots of subscribers. An alternative (especially if live-data changes are not so relevant) is to use the Meteor Methods:

server

Meteor.method('hugeData', function () {
  return MyCollection.find({ ...}).fetch()
})

Note that this requires to fetch the docs, otherwise and unhandledPromiseRejection is thrown.

client

This requires a LocalCollection on the client, that is not in sync with your server side collection, or you will get problems with document syncing:

const HugeData = new LocalCollection(null) // note the null as collection name!

const insertUpdate = document => {
  if (LocalCollection.findOne(document._id)) {
    delete document._id
    return LocalCollection.update(document._id, document)
  } else {
    return LocalCollection.insert(document)
  }
}

Meteor.call('hudeData', (err, data) => {
  data.forEach(insertUpdate)
})

Then you can use the LocalCollection on the client for any projection of the received data.

All in all it is a good tradeoff to move the load to the clients. As long as you keep them informed when projections take a while it should be okay.

Upvotes: 1

Jithin Zacharia
Jithin Zacharia

Reputation: 371

MongoDB allows you to design the schema in such a way that it can store Objects and Object relation in a schema, So you can allow the user to store any kind of information. As @kevinadi said, there is a limit of 32MB. As of sorting is concerned it can be done on your serverside.

This is an example I tried when storing objects in MongoDB and Mongoose ORM

var mongoose = require("mongoose");
var userSchema = new mongoose.Schema({
  email: {
    type: String,
    unique: true,
    required: true,
    lowercase: true,
    trim: true,
    match: [/^\w+([\.-]?\w+)*@\w+([\.-]?\w+)*(\.\w{2,3})+$/, "Please fill a valid email address"]
  },
  custInfo:{
  type:Object,
  required: true
  }
  isConfirmed: {
    type: Boolean,
    required: true,
    default: false
  },
  confirmedOn:{
    type: Date,
    required: true,
    default: Date.now()
  }
});

module.exports = mongoose.model("user",userSchema);

Upvotes: 1

Related Questions