Ken Williams
Ken Williams

Reputation: 24005

MongoDB - aggregate to another collection?

I have a process that I'm currently using Mongo's Map/Reduce framework for, but it's not performing very well. It's a pretty simple aggregation, where I bucketize over 3 fields, returning the sum of 4 different fields, and passing through the values for another 4 fields (which are constant within each bucket).

For reasons described in [ Map-Reduce performance in MongoDb 2.2, 2.4, and 2.6 ], I'd like to convert this to the aggregation framework for better performance, but there are 3 things standing in the way, I think:

  1. The total result can be large, exceeding Mongo's 16MB limit, even though any one document in the result is very small.
  2. I can map/reduce directly to another collection, but the aggregation framework can only return results inline (I think?)
  3. For incremental updates as more data arrives in the source collection, I can map/reduce with MapReduceCommand.OutputType (in Java) set to REDUCE, exactly matching my use case, but I don't see a corresponding functionality in the aggregation framework.

Are there good ways to solve these in the aggregation framework? The server is version 2.4.3 right now - we can probably update as needed if there are new capabilities.

Upvotes: 4

Views: 4796

Answers (2)

lesolorzanov
lesolorzanov

Reputation: 3615

You can do that now with $out as explained in mongo

$out Takes the documents returned by the aggregation pipeline and writes them to a specified collection. The $out operator lets the aggregation framework return result sets of any size. The $out operator must be the last stage in the pipeline.

The command has the following syntax, where is collection that will hold the output of the aggregation operation. $out is only permissible at the end of the pipeline:

db.<collection>.aggregate( [
     { <operation> },
     { <operation> },
     ...,
     { $out : "<output-collection>" }
] )

Upvotes: 4

Sai
Sai

Reputation: 461

The Aggregation framework currently cannot be outputted to another collection directly. However you can try the answer in this discussion: SO-questions-output aggregate to new collection The mapreduce is way slower and I too have been waiting for a solution. You can try the Hadoop to Mongodb connector, which is supported in the mongodb website. Hadoop is faster at mapreduce. But I do not know if it would be well suited in your specific case.

Link to hadoop + MongoDB connector

All the best.

Upvotes: 1

Related Questions