Mohit Kumar
Mohit Kumar

Reputation: 11

How to Optimize MongoDB Aggregation Pipeline with Keyword Search and Sorting for Large Dataset?

I am working on a Laravel project that uses MongoDB via MongoDB Atlas. I have a collection with millions of documents, and I am running a complex search query that includes keyword search, sorting, and grouping. However, the query is taking too long to execute and is not performing well. Here is the query I am using:

[
    {
        "$search": {
            "index": "keyword_search_index",
            "queryString": {
                "defaultPath": "name",
                "query": "((title: (\"product\") OR description: (\"product\") OR ai_tech_types.name: (\"product\") OR company.name: (\"product\") OR industry_focus.name: (\"product\") OR skills.name: (\"product\") OR title: (\"products\") OR description: (\"products\") OR ai_tech_types.name: (\"products\") OR company.name: (\"products\") OR industry_focus.name: (\"products\") OR skills.name: (\"products\")) OR (title: (\"data\") OR description: (\"data\") OR ai_tech_types.name: (\"data\") OR company.name: (\"data\") OR industry_focus.name: (\"data\") OR skills.name: (\"data\")) OR (title: (\"artificial intelligence\") OR description: (\"artificial intelligence\") OR ai_tech_types.name: (\"artificial intelligence\") OR company.name: (\"artificial intelligence\") OR industry_focus.name: (\"artificial intelligence\") OR skills.name: (\"artificial intelligence\") OR title: (\"artificial intelligences\") OR description: (\"artificial intelligences\") OR ai_tech_types.name: (\"artificial intelligences\") OR company.name: (\"artificial intelligences\") OR industry_focus.name: (\"artificial intelligences\") OR skills.name: (\"artificial intelligences\")) OR (title: (\"learning\") OR description: (\"learning\") OR ai_tech_types.name: (\"learning\") OR company.name: (\"learning\") OR industry_focus.name: (\"learning\") OR skills.name: (\"learning\") OR title: (\"learnings\") OR description: (\"learnings\") OR ai_tech_types.name: (\"learnings\") OR company.name: (\"learnings\") OR industry_focus.name: (\"learnings\") OR skills.name: (\"learnings\")))"
            }
        }
    },
    {
        "$match": {
            "post_date": {
                "$gte": {
                    "$date": "2024-06-18T18:30:00Z"
                },
                "$lte": {
                    "$date": "2024-07-18T18:29:59Z"
                }
            }
        }
    },
    {
        "$unwind": {
            "path": "$ai_tech_types"
        }
    },
    {
        "$group": {
            "_id": {
                "$toLower": "$ai_tech_types.code"
            },
            "name": {
                "$first": "$ai_tech_types.name"
            },
            "code": {
                "$first": "$ai_tech_types.code"
            },
            "count": {
                "$sum": 1
            }
        }
    },
    {
        "$project": {
            "name": 1,
            "code": 1,
            "count": 1
        }
    },
    {
        "$sort": {
            "count": -1
        }
    }
]


Requirements: Perform a text search across multiple fields. Filter documents based on a date range. Unwind an array field. Group by a specific field, count occurrences, and then sort by the count.

Issues: The query is taking too long to execute and often times out. The dataset is quite large, with millions of documents.

What I've Tried: Ensuring appropriate indexes are in place. Optimising the $search query.

Questions: How can I optimise this query for better performance given the large dataset? Are there any MongoDB Atlas-specific optimisation or configurations that I can leverage? Is there a more efficient way to structure the query or the data to achieve the same results?

Upvotes: 0

Views: 53

Answers (0)

Related Questions