Reputation: 1573
I am wondering whether writing $project
just after the $match
statement is actually decrease the amount of data to be kept in memory. As an example if we want an array element with paging from a user document like following:
const skip = 20;
const limit = 50;
UserModel.aggregate([
{ $match: { _id: userId } },
{ $project: { _id: 0, postList: 1 } },
{ $slice: ["$postList", skip, limit] },
{ $lookup: ...
]);
Assume that there are other lists in the user document and they are very large in size.
So, Is $project
will help to improve the performance by not taking other large lists in memory?
Upvotes: 2
Views: 2122
Reputation: 14317
Each aggregation stage scans the input documents from the collection (if its the first stage) or the previous stage. For example,
Each stage can affect the memory or cpu or both. In general the document size, number of documents, the indexes, and memory can affect the query performance.
The memory restrictions for aggregation are already clearly specified in the documentation (see Aggregation Pipeline Limits). If the memory limit exceeds the restrictions the aggregation will terminate. In such cases you can specify the aggregation option { allowDiskuse: true }
, and the usage of this option will affect the query performance. If your aggregation is working without any memory related issues (like query termination due to exceeding the memory limits) then there is no issue with your query performance directly.
The $match
and $sort
stages use indexes, if used early in the pipeline. And this can improve performance.
Adding a stage to a pipeline means extra processing, and it can affect the overall performance. This is because the documents from the previous stage has to pass thru this extra stage. In an aggregation pipeline the documents are passed through each stage - like in a pipe and the stage does some data transformation. If you can avoid a stage it can benefit the overall query performance, sometimes. When the numbers are large, having an extra (unnecessary) stage is definitely a disadvantage. You have to take into consideration both the memory restrictions as well as size and the number of documents.
A $project
can be used to reduce the size of the document. But, is it necessary to add this stage? It depends on the factors I had mentioned above and your implemetation and the application. The documentataion (Projection Optimization) says:
The aggregation pipeline can determine if it requires only a subset of the fields in the documents to obtain the results. If so, the pipeline will only use those required fields, reducing the amount of data passing through the pipeline.
Upvotes: 2