matheuscscp
matheuscscp

Reputation: 878

MongoDB $text operator matching documents where the search string is a substring

I'm aware that $text operator doesn't work with regexes... But I need some search to work as the following.

Documents:

{ "field1": "some content", "field2:" "another content"}

{ "field1": "yet one more content", "field2": "the final content"}

If we search by string "ye ano", both documents should be in the results, because ye occurs in the second document and ano occurs in the first.

A workaround for that with $text operator would be really appreciated, because of case/diacritic insensitivity.

I would also accept something with a behavior not equal but close. The major concern is efficiency, because I already have an O(n lg n) solution, but this is very expensive for a search...

Upvotes: 2

Views: 824

Answers (1)

profesor79
profesor79

Reputation: 9473

it looks like we need to have OR and perform search by regex on both fields, first create indexes :-)

db.math.createIndex({field1:1})

db.math.createIndex({field2:1})

then use [0-4] to search for first signs in the text field, if you omit colscan will occur

db.math.find({$or:[{"field1":{$regex:/ye|ano[0-4]/}},{"field2":{$regex:/ye|ano[0-4]/}}]}
).pretty()
{
        "_id" : ObjectId("56d0c236854cc0de43173fa6"),
        "field1" : "some content",
        "field2" : "another content"
}
{
        "_id" : ObjectId("56d0c24b854cc0de43173fa7"),
        "field1" : "yet one more content",
        "field2" : "the final content"
}

and what is more important index is used for a search:

     db.math.find({$or:[{"field1":{$regex:/ye|ano[0-4]/}},{"field2":{$regex:/ye|ano
    [0-4]/}}]}).explain()
    {
            "queryPlanner" : {
                    "plannerVersion" : 1,
                    "namespace" : "test.math",
                    "indexFilterSet" : false,
                    "parsedQuery" : {
                            "$or" : [
                                    {
                                            "field1" : /ye|ano[0-4]/
                                    },
                                    {
                                            "field2" : /ye|ano[0-4]/
                                    }
                            ]
                    },
                    "winningPlan" : {
                            "stage" : "SUBPLAN",
                            "inputStage" : {
                                    "stage" : "FETCH",
                                    "inputStage" : {
                                            "stage" : "OR",
                                            "inputStages" : [
                                                    {
                                                            "stage" : "IXSCAN",
                                                            "filter" : {
                                                                    "$or" : [
                                                                            {
                                                                                "field1" : /ye|ano[0-4]/
                                                                            }
                                                                    ]
                                                            },
                                                            "keyPattern" : {
                                                                    "field1" : 1,
                                                                    "field2" : 1
                                                            },
                                                            "indexName" : "field1_1_field2_1",
                                                            "isMultiKey" : false,
                                                            "direction" : "forward",

                                                            "indexBounds" : {
                                                                    "field1" : [
                                                                            "[\"\",{})",
                                                                            "[/ye|ano[0-4]/, /ye|ano[0-4]/]"
                                                                    ],
                                                                    "field2" : [
                                                                            "[MinKey, MaxKey]"
                                                                    ]
                                                            }
                                                    },
                                                    {
                                                            "stage" : "IXSCAN",
                                                            "filter" : {
                                                                    "$or" : [
                                                                            {
                                                                                "field2" : /ye|ano[0-4]/
                                                                            }
                                                                    ]
                                                            },
                                                            "keyPattern" : {
                                                                    "field2" : 1
                                                            },
                                                            "indexName" : "field2_1"
    ,
                                                            "isMultiKey" : false,
                                                            "direction" : "forward",

                                                            "indexBounds" : {
                                                                    "field2" : [
                                                                            "[\"\",{})",
                                                                            "[/ye|ano[0-4]/,/ye|ano[0-4]/]"
                                                                    ]
                                                            }
                                                    }
                                            ]
                                    }
                            }
                    },
                    "rejectedPlans" : [ ]
            },
            "serverInfo" : {
                    "host" : "greg",
                    "port" : 27017,
                    "version" : "3.0.8",
                    "gitVersion" : "83d8cc25e00e42856924d84e220fbe4a839e605d"
            },
            "ok" : 1
    }

Upvotes: 1

Related Questions