Does MongoDB aggregation $project decrease the amount of data to be kept in memory?

Question

I am wondering whether writing $project just after the $match statement is actually decrease the amount of data to be kept in memory. As an example if we want an array element with paging from a user document like following:

const skip = 20;
const limit = 50;

UserModel.aggregate([
            { $match: { _id: userId } },
            { $project: { _id: 0, postList: 1 } },
            { $slice: ["$postList", skip, limit] },
            { $lookup: ...
]);

Assume that there are other lists in the user document and they are very large in size.

So, Is $project will help to improve the performance by not taking other large lists in memory?

prasad_ · Accepted Answer

Each aggregation stage scans the input documents from the collection (if its the first stage) or the previous stage. For example,

match (filters the documents) - this will reduce the number of documents, the overall size
project (transforms or shapes the document) - this can reduce (or increase) the size of the document; the number of documents remain same
group - reduces the number of documents and changes the size
skip, limt - reduce the number of documents
sort - no change in the size or number of documents, etc.

Each stage can affect the memory or cpu or both. In general the document size, number of documents, the indexes, and memory can affect the query performance.

The memory restrictions for aggregation are already clearly specified in the documentation (see Aggregation Pipeline Limits). If the memory limit exceeds the restrictions the aggregation will terminate. In such cases you can specify the aggregation option { allowDiskuse: true }, and the usage of this option will affect the query performance. If your aggregation is working without any memory related issues (like query termination due to exceeding the memory limits) then there is no issue with your query performance directly.

The $match and $sort stages use indexes, if used early in the pipeline. And this can improve performance.

Adding a stage to a pipeline means extra processing, and it can affect the overall performance. This is because the documents from the previous stage has to pass thru this extra stage. In an aggregation pipeline the documents are passed through each stage - like in a pipe and the stage does some data transformation. If you can avoid a stage it can benefit the overall query performance, sometimes. When the numbers are large, having an extra (unnecessary) stage is definitely a disadvantage. You have to take into consideration both the memory restrictions as well as size and the number of documents.

A $project can be used to reduce the size of the document. But, is it necessary to add this stage? It depends on the factors I had mentioned above and your implemetation and the application. The documentataion (Projection Optimization) says:

The aggregation pipeline can determine if it requires only a subset of the fields in the documents to obtain the results. If so, the pipeline will only use those required fields, reducing the amount of data passing through the pipeline.

Does MongoDB aggregation $project decrease the amount of data to be kept in memory?

Answers (1)

Related Questions