Ranjith Ramachandra
Ranjith Ramachandra

Reputation: 10764

How to overcome the limitations with mongoDB aggregation framework

The aggregation framework on MongoDB has certain limitations as per this link.

I want to remove the restrictions 2, 3.

I really do not care what the resulting set's size is. I have a lot of RAM and resources.

And I do not care if it takes more than 10% system resources.

I expect both 2, 3 to be violated in my application. Mostly 2.

But I really need the aggregation framework. Is there anything that I can do to remove these limitations?

The application I have been working has these things

  1. The user has the ability to upload a large dataset
  2. We have a menu to let him sort, aggregate etc
  3. The aggregate has no restrictions currently and the user can choose to do whatever he wants. Since the data is not known to the developer and since it is possible to group by any number of columns, the application can error out.

Choosing something other than mongodb is a no go. We have already sunk too much into development with MongoDB

Is it advisable to change the source code of Mongo?

Upvotes: 0

Views: 2275

Answers (1)

Artem Mezhenin
Artem Mezhenin

Reputation: 5757

1) Saving aggregated values directly to some collection(like with MapReduce) will released in future versions, so first solution is just wait for a while :)

2) If you hit 2-nd or 3-rd limitation may you should redesign your data scheme and/or aggregation pipeline. If you working with large time series, you can reduce number of aggregated docs and do aggregation in several steps (like MapReduce do). I can't say more concretely, because I don't know your data/use cases(give me a comment).

3) You can choose different framework. If you familiar with MapReduce concept, you can try Hadoop(it can use MongoDB as data source). I don't have experience with MongoDB-Hadoop integration, but I mast warn you NOT to use Mongo's MapReduce -- it sucks hard on large datasets.

4) You can do aggregation inside your code, but you should use some "lowlevel" language or library. For example, pymongo (http://api.mongodb.org/python/current/) is not suitable for such things, but you can tray something like monary(https://bitbucket.org/djcbeach/monary/wiki/Home) to efficiently extract date and NumPy or Pandas to aggregate it the way want.

Upvotes: 2

Related Questions