Pipeline vs MapReduce in MongoDB

Question

When do I use prefer MapReduce over Pipeline in MongoDB or vice versa? I feel most of the aggregation operations are suitable for pipeline. What kind of complexity of the problem or what use case should make me go for MapReduce.

Philipp · Accepted Answer

As a general rule of thumb: When you can do it with the aggregation pipeline, you should.

One reason is that the aggregation pipeline is able to use indexes and internal optimizations between the aggregation steps which are just not possible with MapReduce.

Aggregation is also a lot more secure when the operation is triggered by user input. When there are any user-supplied parameters to your query, MapReduce forces you to create javascript functions through string concatenation. This opens the door for dangerous Javascript code injection vulnerabilities. The APIs used for creating aggregation pipeline objects (in most programming languages!) usually has fewer such obvious pitfalls.

There are, however, still a few cases which can not be done easily or not at all with aggregation. For these cases, MapReduce has still a reason to exist.

Another limitation of the aggregation framework is that the intermediate dataset after each aggregation step is limited to 100MB unless you use the allowDiskUse option, which really slows down the query. MapReduce usually behaves a lot better when you need to work with a really large dataset.

Pipeline vs MapReduce in MongoDB

Answers (1)

Related Questions